Google announces Gemini 3.1 Pro, says it's better at complex problem-solving
Briefly

Google announces Gemini 3.1 Pro, says it's better at complex problem-solving
"Google announced improvements to its Deep Think tool last week, and apparently, the "core intelligence" behind that update was Gemini 3.1 Pro. As usual, Google's latest model announcement comes with a plethora of benchmarks that show mostly modest improvements. In the popular Humanity's Last Exam, which tests advanced domain-specific knowledge, Gemini 3.1 Pro scored a record 44.4 percent. Gemini 3 Pro managed 37.5 percent, while OpenAI's GPT 5.2 got 34.5 percent."
"Gemini 3 was a bit behind on this evaluation, reaching a mere 31.1 percent versus scores in the 50s and 60s for competing models. Gemini 3.1 Pro more than doubles Google's score, reaching a lofty 77.1 percent. Google has often gloated when it releases new models that they've already hit the top of the Arena leaderboard (formerly LM Arena), but that's not the case this time."
Google released Gemini 3.1 Pro in preview for developers and consumers, promoting improved problem-solving and reasoning. The model underpins recent Deep Think improvements and delivers modest gains across many benchmarks. In Humanity's Last Exam, Gemini 3.1 Pro scored 44.4 percent versus 37.5 for Gemini 3 Pro and 34.5 for GPT 5.2. On ARC-AGI-2, which uses novel logic problems, Gemini 3.1 Pro jumped to 77.1 percent after Gemini 3 scored 31.1 percent. The Arena leaderboard shows other models ahead in some categories—Claude Opus 4.6 leads text and several models outperform Gemini on code. Arena rankings depend on user voting, which can favor superficially convincing outputs.
Read at Ars Technica
Unable to calculate read time
[
|
]