A study testing eight different AI chatbots revealed they produced incorrect citations on average 60% of the time. The best performer, Perplexity, incorrectly cited 37% of the time, while Grok 3 struggled more, with 94% inaccurate citations. The chatbots often presented their incorrect responses confidently, raising concerns about their reliability, especially the paid versions. Additionally, the study highlighted a worrying trend where the web spiders of these AI tools frequently bypassed publishers' paywalls, further complicating the landscape of information sourcing and attribution.
The findings indicate a troubling trend: AI chatbots, while impressive in their ability to generate content, often fail to provide accurate sourcing, with an average of 60% incorrect citations across eight tested models.
Despite overall inferior performance in citation accuracy, Perplexity showed some promise, with only 37% incorrect citations. This indicates a need for improvement across the board for chatbots.
Collection
[
|
...
]