
"In a recent research paper, OpenAI suggested that the tendency of large language models (LLMs) to hallucinate stems from the way standard training and evaluation methods reward guessing over acknowledging uncertainty. According to the study, this insight could pave the way for new techniques to reduce hallucinations and build more trustworthy AI systems, but not all agree on what hallucinations are in the first place."
"We observe that existing primary evaluations overwhelmingly penalize uncertainty, and thus the root problem is the abundance of evaluations that are not aligned. Suppose Model A is an aligned model that correctly signals uncertainty and never hallucinates. Let Model B be similar to Model A except that it never indicates uncertainty and always "guesses" when unsure. Model B will outperform A under 0-1 scoring, the basis of most current benchmarks."
Hallucinations in large language models originate from errors during pre-training when models cannot distinguish incorrect statements from facts because they are exposed predominantly to positive examples. Those pre-training errors persist through post-training because common evaluation methods prioritize accuracy and penalize expressions of uncertainty or abstention. Standard 0-1 scoring rewards models that always guess over models that appropriately signal uncertainty, creating incentives to hallucinate. Reducing hallucinations requires rethinking evaluation design, including penalizing confident errors more heavily and relatively rewarding appropriate expressions of uncertainty to align incentives with truthfulness.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]