Looks like that not now. Anyway, the popular chatbot ChatGPT failed an economics exam at NES. The experiment conducted by GURU showed:
- what the chatbot is capable of;
- how it makes elementary mistakes;
- how it solves tasks;
- and how it deceives, guided by the advice of one of Shakespeare's heroes: "Your bait of falsehood takes this carp of truth."
This article is a brief description of the experiment, a more detailed one is here.
How we conducted the experiment
Our "examination board" included NES Professor Olga Kuzmina, GURU journalist Ekaterina Sivyakova and editor-in-chief Philip Sterkin. We have compiled a set of tasks in English that consisted of four blocks to check whether ChatGPT is able to analyze economic issues and find gaps in scholarly knowledge; solve tasks; make forecasts and give psychological advice.
During the two-hour experiment we asked artificial intelligence to try on different roles: a professor of economics, a researcher, an economic journalist and even a tutor. And we asked it to give clear and accurate answers, avoid unnecessary details and not give false information (spoiler: it still did lie to us). Olga Kuzmina assessed the quality of the answers on economics.
It should be noted that several times during the experiment we had to start a new chat due to technical problems (they could be due to the network quality). This could affect how artificial intelligence determined the context of the entire conversation. Several times the chatbot hung up and reported a technical error already while writing the response.
The tasks for which ChatGPT receives a ‘B’ grade
One of the most simple tasks for the chatbot was our request to explain to high school students the Black-Scholes Option Pricing Model. It managed quite well: there were no mistakes, but it used too many terms that would be unclear to school students. Yet, an attempt to explain the formula to a 10-year-old kid led to an even worse result: ChatGPT used a creative approach and an analogy with buying a toy, but made a slip in the explanation, implicitly equating stocks and their options, noted Olga Kuzmina.
AI managed on average better with university-level tasks in econometrics. First, we asked ChatGPT to estimate the Fama-French 3-factor model for Microsoft stock returns (it takes into account market risks, as well as those related to the size and value (undervaluation) of companies). Based on the analysis, the chatbot should have answered whether Microsoft is a growth or a value stock (a fast-growing, often technological company or a stable and strong company, respectively).
The lead-in to the answer was well-based but the result was incorrect. The chatbot solved mostly correctly another typical problem concerning a production function, however, making a minor mistake in the formula.
The tasks that the chatbot failed to solve
Academic research came out to be a weak side of the chatbot. It was required to analyze a database of studies on the impact of women's representation in corporate boards on firm value and operations, as well as to find and briefly describe the gaps in this field of economics.
In the first version of the answer, the chatbot quoted two relevant and widely cited research, what’s more – having very telling titles containing all the mentioned keywords. However, it made mistakes in the description of each study. AI drew conclusions that were exactly the opposite of the actual ones, it mixed up the content of articles and metrics. For example, for some reason the chatbot decided that one of the studies talks about social responsibility, philanthropy and environmental protection, although it is about something completely different, Olga Kuzmina notes.
The chatbot began the second attempt to answer the question with the statement that empirical evidence suggests that having women on boards of directors is associated with positive outcomes for firms in terms of both value and operations. Next, it went on with the justification of this idea. In response to a question about the most influential studies in this field, the chatbot produced a list of four studies, noting that those were "a few examples." Checking the answer gave us a big surprise: studies with such titles do exist, but they were written by other authors and published at a different time.
AI demonstrated some “imagination” when it was asked to discuss the research of NES President Shlomo Weber and his co-authors from the point of view of its value for society. The chatbot wrote that this study shows how in the United States the car drivers’ race influences the decisions of police officers to search them, so it can help in discussions about police reform and racial justice. ChatGPT's conclusion had nothing to do with the actual research, which, in fact, analyzes strategies for immigrants to learn the language spoken by the majority of the country of their destination.
The attempt to describe the practical value of the research of NES Professor Marta Troya-Martinez ended with the same result. ChatGPT stated that the study contributes to the field of economics by analyzing the impact of automation on the labor market. In fact, it is a research on relational contracts (they are based on the trusting relationship of the parties, and the study elaborates on the theory of managed relational contracts).
Perhaps the errors can be explained by the fact that both our inquiries had links redirecting to pdf documents. Therefore, in the next question we included a direct link to the text of a study. Still, the result was no better. The task was to highlight the main ideas of the column by the European Bank for Reconstruction and Development experts on the consequences of the earthquake in Turkey and Syria in February 2023. The chatbot produced general discussions about how much Turkey had suffered from earthquakes, and "quoted" the authors' calls to take urgent measures. In fact, the column is devoted to a model comparison of the impact of the earthquakes of 1999 and 2023 on the country's economy and is accompanied by data on other countries.
Fiction from ChatGPT
At some moment, Olga Kuzmina decided to check if ChatGPT could help her write an abstract for her research. We gave the chatbot a link to her study "Gender Diversity in Corporate Boards: Evidence From Quota-Implied Discontinuities," and asked to come up with a new abstract for it. AI did not cope with the task: it wrote about corporate social responsibility, which has nothing to do with the study.
We decided to give the chatbot the opportunity to generate a new answer. This time we didn't give it a web link to the study, hoping that AI would understand that we were talking about the same research. This attempt ended in complete failure: ChatGPT stated that the work investigates the effect of microplastics on aquatic ecosystems. The third attempt was not successful either: the chatbot returned to the concept of corporate social responsibility.
To eliminate the factor of incorrect reading of links, we uploaded the full text of the introduction (about 2000 words) from the study by Olga Kuzmina, and asked the chatbot first to describe it in three paragraphs, and then rewrite them again in one paragraph. Olga Kuzmina called the three-paragraph version "not a bad one": "The sentences from the long text were taken quite organically, but the main findings of the research were described in a superficial manner." The short version again contained an error.
Finally, we decided to test ChatGPT's ability to give forecasts. In the answer to the question when gender gaps will be closed in the economy, it referred to the forecast of the World Economic Forum that estimated the period of 135.6 years (the factcheck showed that the figure was correct). When asked if humanity can overcome economic inequality, the chatbot replied that it is possible, but specified that it will require a sustained and concerted effort by policymakers, businesses, and individuals. The recipe for achieving economic equality, in AI’s opinion, was the following: the progressive taxation, the development of the social security system, investments in education and training, support for small and medium-sized enterprises, support for firms that promote fair labor practices.
We asked the chatbot to provide facts in support of its position, and it produced five, referring to information from Oxfam, the OECD, the IMF, Pew Research Center, as well as the Harvard Business Review magazine. Then, we asked the chatbot to give direct links to the mentioned documents. It immediately produced a list of links that looked quite plausible, but turned out to redirect to non-existent pages. We searched by the keywords contained in the link, and found that the documents and surveys themselves exist, but are located on other webpages.
“How many hours a day will people work in 10 years?” – was our next inquiry. Finding it difficult to predict it "with certainty", the chatbot referred to "many factors" affecting the number of working hours, and listed three of them:
- automation of routine tasks with the help of robotics and AI technologies can reduce the demand for certain types of labor and at the same time create new types of jobs;
- demographic shifts: as the population ages, the number of jobs may decrease;
- social norms: in recent years, people have begun to pay more attention to the work-life balance and look for a more flexible work schedule, which may lead to a shorter working week or a more flexible work schedule. The chatbot concluded: "It is likely that the number of hours worked will continue to evolve."
Final assessment by NES Professor
Olga Kuzmina: "The chatbot gives good general answers when waffling, but since it distorts facts almost all the time, I would even be rather wary of its waffling as well. For example, in the middle of an apparently reasonable text, completely illogical conclusions or distortion of the basics may occur, for which a student would immediately receive an ‘F’. Perhaps, with more accurate sequential queries, ChatGPT would help save time, but in any case, a person who understands the topic will be needed to check the text written by AI. This is not surprising overall, because even people can't always "read the Internet" and distinguish well between scientific facts and fiction. Even more challenging are the issues on which researchers themselves do not always agree with each other… As for problem solving, I think many professors already use ChatGPT to check their tasks for "commonality".
Conversation for the benefit of students
Toward the end, we asked the chatbot to take on the role of an academic advisor and give five recommendations on how Ph.D. students in economics can help themselves overcome the "depression of the third year" (an informal name for the period when "courses end and you need to come up with something new yourself, which is very difficult," as NES Visiting Professor, ICREA Research Professor at Universitat Pompeu Fabra Ruben Enikolopov explained).
- seek support from peers – fellow Ph.D. students or recent graduates;
- ask for advice from the academic advisor;
- take a break and relax;
- seek out university mental health services;
- revisit your goals and motivations.
After that, we asked the chatbot to suggest how economics students can cope with burnout, and make a list of 10 open educational resources where one can get useful and scientifically proven information on this topic. ChatGPT produced the following: The American Psychological Association's Help Center, burnout checklist from the Mayo Clinic, Harvard Health Publishing department of Harvard Medical School, Mindful.org website, Ted Talks lectures, Coursera Learning Platform, World Health Organization, The National Institute of Mental Health (USA), The Anxiety and Depression Association of America, and The International Stress Management Association (UK). The chatbot added to this list of existent and operating outlets short descriptions of the information and services that they provide.
Finally, we asked ChatGPT to act like an experienced economist and answer the question of how economists handle impostor syndrome. The chatbot advised to recognize that many economists experience this problem, make a list of own strengths and accomplishments, seek support from colleagues and mentors, keep learning and developing skills (including attending conferences, reading academic journals and looking for new challenges, taking care of physical and mental health), get enough sleep and exercise regularly, and seek professional help, for example, therapeutic.
You can read the full description of the experiment with screenshots, some other answers of the chatbot and detailed comments by Olga Kuzmina here.