This week, OpenAI released a significant study that systematically analyzes the root causes of the phenomenon generated by large language models, identified as Illusion. The research points out that the current training and evaluation processes tend to reward the model's guesses rather than encouraging it to acknowledge uncertainty, which is a major factor leading to AI's confidence in generating incorrect information.
AI hallucination refers to the phenomenon where models confidently generate statements that seem reasonable but are actually incorrect. This issue becomes a barrier for users to fully trust AI systems. OpenAI explicitly defines hallucination in its paper as The situation where models confidently generate inaccurate answers., indicating that even with the latest models like GPT-5, the phenomenon of hallucination persists.
In the study, the paper's lead author Adam Tauman Kalai conducted tests and discovered that different chatbots, when asked about the title of their doctoral thesis, confidently provided incorrect answers. This highlights the prevalence of the issue.
The OpenAI team has discovered that current evaluation methods have created flawed incentive structures, which is partly why the phenomenon of hallucination continues to persist. Researchers point out that most approaches to assessing model performance encourage models to make guesses rather than honestly confront uncertainty.
This is similar to multiple-choice tests: if you don’t know the answer but make a random guess, there’s still a chance you might get it right; however, choosing not to answer guarantees a score of zero. Therefore, when the model is evaluated solely based on accuracy, it encourages guessing rather than admitting I don't know..
Research shows that in thousands of test questions, models that adopt guessing strategies ultimately perform better on the scoreboard than those that cautiously acknowledge uncertainties. OpenAI mentions that abandoning an answer is part of a humility indicator, and humility is one of the company’s core values.
Despite the ongoing presence of hallucination phenomena, OpenAI's latest GPT-5 has made significant strides in reducing these issues. According to the system card data for GPT-5, the hallucination rate has dropped by about 26% compared to GPT-4, demonstrating outstanding performance across various assessments.
In tests involving LongFact-Concepts and LongFact-Objects, GPT-5's hallucination rates were merely 0.7% and 0.8%, significantly lower than OpenAI o3's 4.5% and 5.1%. In high-risk scenarios such as medical queries, GPT-5's hallucination rate stood at just 1.6%, demonstrating its superiority in this field.
OpenAI has proposed a straightforward solution: penalties for confident errors should outweigh those for uncertainty, while also rewarding appropriate expressions of uncertainty. The research team emphasizes that simply increasing the perception tests for uncertainty is insufficient; there needs to be an update to the currently widely used assessment methods.
The paper points out that if the main evaluation metrics continue to reward the model's lucky guesses, the model will keep learning this guessing behavior. Modifying the evaluation metrics could broaden the scope for adopting techniques that reduce hallucinations.
The research also analyzed the mechanisms behind hallucinations that emerge during the pre-training process. OpenAI explains that language models learn by predicting the next word in vast amounts of text. Unlike traditional machine learning, each sentence does not have a clear True/False label, and the model can only see positive examples of fluent language.
The research team illustrated via image recognition analogies: if photos are tagged with labels for pet birthdays, even the most advanced algorithms will make mistakes due to the randomness of birthdays. Similarly, errors that arise from examples like spelling and punctuation, which follow consistent patterns, tend to decrease as the data grows; however, random, low-frequency facts like pet birthdays cannot be predicted solely based on patterns, leading to the emergence of illusions.
At the same time, OpenAI is restructuring its model behavior team, a research group of about 14 people responsible for shaping the way AI models interact with humans. According to an internal memo, the model behavior team will merge into the later training team and report to Max Schwarzer, the head of later training.
Joanne Jang, the founding leader of the team, is set to launch a new project called OAI Labs, which will focus on inventing and designing new interface prototypes for human-AI collaboration. The team has previously been dedicated to shaping the personality traits of models and reducing sycophantic behavior, while also addressing issues like political bias.
This study provides a significant theoretical foundation for understanding the issue of AI hallucinations, contributing to the advancement of evaluation standards across the industry. OpenAI has stated that they will continue to work towards further reducing the confidence error rate in language model outputs.
Industry experts believe that with improvements in evaluation methods and the application of new technologies, the issues surrounding AI hallucinations are expected to be better managed. This will further enhance user trust and practicality in AI systems.



