Google researchers find the best AI model is 69% right
Briefly

Google researchers find the best AI model is 69% right
"We just got a sobering picture of how often AI models get their facts straight. This week, Google DeepMind introduced the FACTS Benchmark Suite, which measures how reliably AI models produce factually accurate answers. It tests models in four areas: answering factoid questions from internal knowledge, using web search effectively, grounding responses in long documents, and interpreting images. The best model, Google's Gemini 3 Pro, reached 69% accuracy, with other leading models falling well below that."
"Beyond journalism, this number should matter to businesses betting on AI. While models excel at speed and fluency, their factual reliability still lags far behind human expectations, especially in tasks involving niche knowledge, complex reasoning, or precise grounding in source material. Even small factual errors can have outsized consequences in sectors such as finance, healthcare, and the law. This week, my talented colleague Melia Russell looked at how law firms are handling the rise of AI models as a source of legal truth."
Google DeepMind introduced the FACTS Benchmark Suite to measure how reliably AI models produce factually accurate answers across four areas: internal factoid questions, web search use, grounding in long documents, and image interpretation. The top model, Gemini 3 Pro, achieved 69% accuracy while other leading models scored much lower. Factual reliability lags behind speed and fluency, creating risks for businesses relying on AI in journalism, finance, healthcare, and law. Even small factual errors can cause major consequences; one law firm fired an employee after a document contained fabricated cases generated by ChatGPT. The benchmark quantifies failures and provides a roadmap for improvement.
Read at Business Insider
Unable to calculate read time
[
|
]