Because of the way cutting-edge chatbots are built, they tend to present all their claims with uniform confidence—regardless of subject matter or accuracy. There is no difference to a language model between something that is true and something that's not, says AI researcher Andreas Kirsch.
Hallucinations have proved elusive and persistent, but computer scientists are refining ways to detect them in a large language model, or LLM. A new project aims to check an LLM's output for suspected flubs by running it through another LLM, examining multiple answers from the first system for consistency.
The concept of AI systems cross-examining one another isn't a new idea, but Kossen and his colleagues have surpassed previous benchmarks for spotting hallucinations. Their approach involves assessing the system's uncertainty level by analyzing multiple responses and comparing their consistency.
Collection
[
|
...
]