
"Their audit encompassed 111 million references in papers and preprints listed in major repositories including arXiv, bioRxiv, Social Science Research Network (SSRN), and PubMed Central servers, and found that there were 146,932 hallucinated citations in material published in 2025 alone."
"The analysis also suggests that the prevalence of hallucinated citations depends on the area of research. SSRN, a preprint server for social sciences research, had the highest rate of hallucinated citations at nearly 2%, almost five times higher than any other major repository."
"To quantify the scale of the problem, the researchers extracted reference titles from millions of manuscripts and checked them against Semantic Scholar, OpenAlex and Google Scholar. References that could not be matched, and that an LLM judged to be intended as academic sources, were flagged as unmatched."
"Because bibliographic errors have always existed, the researchers only counted faulty references appearing in material published after 2022, the year in which ChatGPT, the first publicly available LLM, was launched."
Researchers audited 2.5 million papers and preprints to measure hallucinated citations. The audit covered 111 million references across major repositories including arXiv, bioRxiv, SSRN, and PubMed Central. In material published in 2025 alone, 146,932 hallucinated citations were found. The prevalence varied by research area, with SSRN showing the highest rate at nearly 2%, about five times higher than other major repositories. The work used reference-title extraction and matching against Semantic Scholar, OpenAlex, and Google Scholar. Unmatched references that an LLM judged to be intended as academic sources were flagged. Only faulty references appearing after 2022 were counted to account for earlier bibliographic errors.
Read at Nature
Unable to calculate read time
Collection
[
|
...
]