Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score

"Another week, another "smarter" model -- this time from Google, which just released Gemini 3.1 Pro. Gemini 3 outperformed several competitor models since its release in November, beating Copilot in a few of our in-house task tests, and has generally received praise from users. Google said this latest Gemini model, announced Thursday, achieved "more than double the reasoning performance of 3 Pro" in testing, based on its 77.1% score on the ARC-AGI-2 benchmark for "entirely new logic patterns.""

"The latest model follows a "major upgrade" to Gemini 3 Deep Think last week, which boasted new capabilities in chemistry and physics alongside new accomplishments in math and coding, according to Google. The company said the Gemini 3 Deep Think upgrade was built to address "tough research challenges -- where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete." Google said Gemini 3.1 Pro undergirds that science-heavy investment, calling the model the "upgraded core intelligence that makes those breakthroughs possible.""

"Late last year, Gemini 3 scored a new high of 38.3% across all currently available models on the Humanity's Last Exam (HLE) benchmark test. Developed to combat increasingly beatable industry-standard benchmarks and better measure model progress against human ability, HLE is meant to be a more rigorous test, though benchmarks alone aren't sufficient to determine performance. According to Google, Gemini 3.1 Pro now bests that score at 44.4% -- though the Deep Think upgrade technically scored higher at 48.4%. Similarly, the Deep Think update scored 84.6% --"

Google released Gemini 3.1 Pro as an upgraded core model that builds on Gemini 3 and the recent Deep Think improvements. The model reportedly achieved a 77.1% score on the ARC-AGI-2 benchmark for entirely new logic patterns, which Google characterized as more than double the reasoning performance of 3 Pro. Deep Think introduced advances in chemistry, physics, math, and coding and targeted messy, open-ended research problems. Gemini 3.1 Pro raised Humanity's Last Exam (HLE) performance to 44.4% from Gemini 3's 38.3%, while Deep Think had reached 48.4%. Benchmarks provide measurable progress but do not fully determine overall capability.

#gemini-31-pro #ai-benchmarks #reasoning-performance #scientific-capabilities

Read at ZDNET

Unable to calculate read time

Collection

[

...

]

Google's Gemini 3.1 Pro is here, and it just doubled its reasoning scoreGoogle's Gemini 3.1 Pro is here, and it just doubled its reasoning score Briefly

Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score
Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score
Briefly