Gemini Flash model gets visual reasoning capability
Briefly

Gemini Flash model gets visual reasoning capability
"Google has added an Agentic Vision capability to its Gemini 3 Flash model, which the company said combines visual reasoning with code execution to ground answers in visual evidence. The capability fundamentally changes how AI models process images, according to Google. Introduced January 27, Agentic Vision is available via the Gemini API in the Google AI Studio development tool and Vertex AI in the Gemini app."
"By combining visual reasoning andcode execution, the model formulates plans to zoom in, inspect, and manipulate images step-by-step. Until now, multimodal models typically processed the world in a single, static glance. If they missed a small detail-like a serial number or a distant sign-they were forced to guess, Google said. By contrast, Agentic Vision converts image understanding into an active investigation, introducing an agentic, "think, act, observe" loop into image understanding tasks, the company said."
Agentic Vision adds iterative visual reasoning and code execution to Gemini 3 Flash, enabling grounded answers based on visual evidence. The feature is available through the Gemini API in Google AI Studio and Vertex AI in the Gemini app, introduced January 27. The capability turns image understanding from a single static glance into an agentic, step-by-step process that can plan to zoom, inspect, and manipulate image regions. Multimodal models that previously guessed when small details were missed can now perform active investigations. The approach introduces a "think, act, observe" loop that cycles through observations and actions to improve image-based reasoning.
Read at InfoWorld
Unable to calculate read time
[
|
]