Google Supercharges Gemini 3 Flash with Agentic Vision

Gemini 3 Flash integrates agentic vision that combines visual reasoning with executable Python code to ground answers in visual evidence. The model plans multi-step investigations instead of single-pass image analysis, then generates and runs Python to crop, zoom, annotate, calculate, and append transformed images before finalizing answers. The iterative think->act->observe loop enables finer inspection of small details and object counting via drawn bounding boxes and labels. Visual arithmetic and data visualization are executed deterministically in Python and Matplotlib to reduce hallucinations in image-based math. The approach yields roughly 5–10% accuracy improvements across vision benchmarks and unlocks new AI-driven behaviors.

"Google has added agentic vision to Gemini 3 Flash, combining visual reasoning with code execution to "ground answers in visual evidence". According to Google, this not only improves accuracy, but more importantly unlocks entirely new AI-driven behaviors. Briefly, rather than analyzing an image in a single pass, Gemini 3 Flash now approaches vision as an agent‑like investigation: planning steps, manipulating the image, and using code to verify details before answering."

"This leads to a "think -> act -> observe" loop, in which the model first analyzes the prompt and the image to plan a multi-step approach; then it generates and executes Python code to manipulate the image and extract additional information from it, such as cropping, zooming, annotating, or calculating; and finally, appends the transformed image to its context before producing a new answer."

"According to Google, this approach yields a 5-10% accuracy improvement on vision tasks across most vision benchmarks, driven by two major factors. First, code execution enables fine-grained inspection of details in an image by zooming into smaller visual elements, such as tiny text, rather than relying on guesses. Gemini can also annotate images by drawing bounding boxes and labels to strengthen is visual reasoning, for example by correctly counting objects."

#agentic-vision #gemini-3-flash #visual-reasoning #code-execution #computer-vision

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

Google Supercharges Gemini 3 Flash with Agentic VisionGoogle Supercharges Gemini 3 Flash with Agentic Vision Briefly

Google Supercharges Gemini 3 Flash with Agentic Vision
Google Supercharges Gemini 3 Flash with Agentic Vision
Briefly