
"Although Google refers to it as an update to 3 Pro, the improvements are significant. At an earlier stage of GenAI, this would have been given a completely new number, with significantly stronger reasoning capabilities. This is evident from the ARC-AGI-2 benchmark, in which 3 Pro scores 31.1 percent and 3.1 Pro surpasses that by a factor of two: 77.1 percent."
"The release of the Llama 4 model in April last year already showed that benchmarks don't tell the whole story. At the time, Meta seemed competitive with Google, OpenAI, and Anthropic, but actual users were generally disappointed. We have not yet seen such a contrast between theory and practice from Google. That is why the scores of 3.1 Pro, which are generally slightly better than those of 3 Pro, seem to represent a meaningful upgrade on several fronts."
"More consistent is the challenge of Claude, where Opus 4.6 encodes practically as well as 3.1 Pro. The SWE-Bench Verified for agentic coding is 80.6 percent for 3.1 Pro and 80.8 percent for Opus 4.6. This Claude model also finds it easier to use tools than the latest Gemini model, while expert tasks are best performed by Sonnet 4.6."
Google is rolling out Gemini 3.1 Pro across the Gemini app, NotebookLM for paying users, and in preview via the Gemini API. The update yields a major jump on ARC-AGI-2, from 31.1 percent for 3 Pro to 77.1 percent for 3.1 Pro. MMMU Pro multimodal scores slightly declined from 81.0 to 80.5 percent. Agentic coding performance is similar between Gemini 3.1 Pro (80.6 percent) and Claude Opus 4.6 (80.8 percent). Claude Sonnet 4.6 leads on expert tasks, and Opus finds tool use easier, signaling close competition rather than clear dominance.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]