
"Generative AI tools, and the deep research agents and search engines powered by them, frequently make unsupported and biased claims that aren't backed up by the sources they cite. That's according to an analysis which found that about one-third of answers provided by the AI tools aren't backed up by reliable sources. For OpenAI's GPT 4.5, the figure was even higher, at 47 per cent."
"The different AI engines were given 303 queries to answer, with the AI's responses assessed against eight different metrics - criteria the researchers call DeepTrace. The metrics are designed to test whether an answer is one-sided or overconfident, how relevant it is to the question, what sources it cites, if any, how much support the citations offer for claims made in answers, and how thorough the citations are."
Generative AI search engines and deep research agents were evaluated using 303 queries and eight DeepTrace metrics. The metrics measured one-sidedness or overconfidence, relevance, cited sources, citation support for claims, and citation thoroughness. Around one-third of AI answers lacked backing from reliable sources, with GPT 4.5 showing 47% unsupported answers. The test set included contentious questions to reveal biases and expertise-based questions across meteorology, medicine, and human–computer interaction. Deep research features tested included GPT-5 Deep Research, Bing Chat's Think Deeper, and tools from You.com, Google Gemini, and Perplexity. Results indicate frequent unsupported and biased claims, implying users must critically verify AI outputs and cited sources.
Read at New Scientist
Unable to calculate read time
Collection
[
|
...
]