A recent study published in the Royal Society reveals alarming inaccuracies in AI chatbot responses, with 73% of answers from models like ChatGPT-4o and LLaMA 3.3 70B found to be incorrect. The research highlights that newer models tend to omit key details far more than older versions, contradicting industry claims of improving accuracy. The findings indicate a worrying trend, where the increased use of these chatbots, particularly among teens, correlates with a higher likelihood of overgeneralization and misinterpretation of scientific studies, creating significant risks in understanding research outcomes.
When summarizing scientific texts, LLMs may omit details that limit the scope of research conclusions, leading to generalizations of results broader than warranted by the original study.
The LLMs' rate of error was found to increase the newer the chatbot was - the exact opposite of what AI industry leaders have been promising us.
For example, use of the two ChatGPT models listed in the study doubled from 13 to 26 percent among US teens between 2023 and 2025.
A significant risk of large-scale misinterpretations of research findings is posed by the correlation between an LLM's tendency to overgeneralize and its widespread use.
Collection
[
|
...
]