The BBC published a detailed report evaluating the accuracy of large language models (LLMs) like ChatGPT-4o and Google Gemini Standard in summarizing news content. Analyzing 100 trending news questions, BBC journalists found that 51% of the 362 responses had significant inaccuracies, particularly misrepresentations and misquotes. Google Gemini performed poorly, showing significant issues in over 60% of responses, while Perplexity performed the best at just over 40%. The findings raise concerns about the reliability of AI assistants for accurate news delivery.
Fifty-one percent of responses were judged to have "significant issues" in at least one of these areas, the BBC found.
AI assistants cannot currently be relied upon to provide accurate news, and they risk misleading the audience.
The results found inaccuracies, misquotes, and/or misrepresentations of BBC content in a significant proportion of the tests.
Google Gemini fared the worst overall, with significant issues judged in just over 60 percent of responses.
Collection
[
|
...
]