
"Google has officially unveiled Gemini 3 Pro, its new state of the art LLM with record-breaking scores across almost every AI benchmark. The new model is intended to improve every Google service that uses Gemini, including its dedicated app, coding tools, and AI in search. Google stated that Gemini 3 Pro is much better at handling requests in their intended context and providing useful answers that don't resort to flattery. The model debuted in the number one spot across text, WebDev, and vision in , with Google having Gemini 3 Pro is the "best model in the world for complex multimodal understanding"."
"Across multimodal reasoning benchmarks, Gemini 3 Pro was found to consistently outperform competition such as GPT-5.1 and Claude Sonnet 4.5. In MMMU-Pro, for example, the model scored 81% versus GPT-5.1's 76% and Claude Sonnet 4.5's 68%. ARC-AGI-2 is a rigorous benchmark for testing the capability of AI model reasoning across a series of abstract visual puzzles. Easy for humans to complete but difficult for today's LLMs, it's considered a true challenge for frontier models. Gemini 3 Pro scored 31.1% in tests, far in excess of GPT 5.1's 17.6% and Claude Sonnet 4.5's 13.6%."
"In other areas, the performance gap between Google's model and the competition is even more stark. Gemini 3 Pro scored a new record score of 23.4% at MathArena Apex, a benchmark that tests LLMs on their ability to solve mathematical problems upon which they weren't trained, compared to just 1% by GPT-5.1 and 1.6% by Claude Sonnet 4.5. Google has also emphasized how its reasoning and visual understanding helps Gemini 3 Pro to do more with the coding capabilities it has, for quicker overall resolution of common developer tasks."
Gemini 3 Pro is a new state-of-the-art large language model that achieved record-breaking results across nearly every AI benchmark. The model targets improvements across Google services including the dedicated Gemini app, coding tools, and search AI. Gemini 3 Pro consistently outperformed rivals such as GPT-5.1 and Claude Sonnet 4.5 on multimodal reasoning tests like MMMU-Pro and ARC-AGI-2, and set a new top score on MathArena Apex. The model advances contextual understanding, reduces flattery, and leverages enhanced visual and reasoning capabilities to accelerate common developer tasks and improve problem resolution.
Read at IT Pro
Unable to calculate read time
Collection
[
|
...
]