
"Large language models often fail to distinguish between factual knowledge and personal belief, and are especially poor at recognizing when a belief is false. A peer-reviewed study argues that, unless LLMs can more reliably distinguish between facts and beliefs and say whether they are true or false, they will struggle to respond to inquiries reliably and are likely to continue to spread misinformation."
"James Zou, Stanford University associate professor, and his colleagues, tested 24 popular LLMs, including DeepSeek and GPT-4o, and analyzed their responses to facts and personal beliefs in around 13,000 questions. They found that LLMs were less likely to point out a false belief compared to a true belief, with newer models 34.3 percent less likely to identify a false first-person belief compared to a true first-person belief."
"Their results show a marked difference in identifying true or false facts: newer LLMs scored 91.1 percent and 91.5 percent accuracy, respectively. Older LLMs were 84.8 percent and 71.5 percent accurate, respectively. The authors also said that, despite some improvements, LLMs struggle to get to grips with the nature of knowledge. They "rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding", the paper said."
Large language models often fail to distinguish factual knowledge from personal belief and are especially poor at recognizing when a belief is false. Tests of 24 popular LLMs, including DeepSeek and GPT-4o, analyzed responses to facts and personal beliefs in around 13,000 questions. Models were substantially less likely to point out false first-person beliefs versus true ones, with newer models 34.3% less likely and older models 38.6% less likely. Newer models scored about 91% accuracy on identifying true or false facts, while older models scored roughly 84.8% and 71.5%. LLMs rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding. These limitations pose risks for high-stakes domains such as medicine, law, and science.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]