
"Another week, another "smarter" model -- this time from Google, which just released Gemini 3.1 Pro. Gemini 3 outperformed several competitor models since its release in November, beating Copilot in a few of our in-house task tests, and has generally received praise from users. Google said this latest Gemini model, announced Thursday, achieved "more than double the reasoning performance of 3 Pro" in testing, based on its 77.1% score on the ARC-AGI-2 benchmark for "entirely new logic patterns.""
"The latest model follows a "major upgrade" to Gemini 3 Deep Think last week, which boasted new capabilities in chemistry and physics alongside new accomplishments in math and coding, according to Google. The company said the Gemini 3 Deep Think upgrade was built to address "tough research challenges -- where problems often lack clear guardrails or a single correct solution and data is often messy or incomplete." Google said Gemini 3.1 Pro undergirds that science-heavy investment, calling the model the "upgraded core intelligence that makes those breakthroughs possible.""
"Late last year, Gemini 3 scored a new high of 38.3% across all currently available models on the Humanity's Last Exam (HLE) benchmark test. Developed to combat increasingly beatable industry-standard benchmarks and better measure model progress against human ability, HLE is meant to be a more rigorous test, though benchmarks alone aren't sufficient to determine performance. According to Google, Gemini 3.1 Pro now bests that score at 44.4% -- though the Deep Think upgrade technically scored higher at 48.4%. Similarly, the Deep Think update scored 84.6% --"
Google released Gemini 3.1 Pro as an upgraded core model that builds on Gemini 3 and the recent Deep Think improvements. The model reportedly achieved a 77.1% score on the ARC-AGI-2 benchmark for entirely new logic patterns, which Google characterized as more than double the reasoning performance of 3 Pro. Deep Think introduced advances in chemistry, physics, math, and coding and targeted messy, open-ended research problems. Gemini 3.1 Pro raised Humanity's Last Exam (HLE) performance to 44.4% from Gemini 3's 38.3%, while Deep Think had reached 48.4%. Benchmarks provide measurable progress but do not fully determine overall capability.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]