
"The model will autonomously fetch the URL and screenshots of the requested site to analyze the user interface it needs to act within, and will perform the requested task step by step, all while outlining its reasoning and actions in a text box easily visible to users. It may also respond by asking for confirmation if it's instructed to perform a sensitive task, like making a purchase."
"Gemini 2.5 Computer Use runs off an iterative looping function that allows it to keep a record of all of its recent actions within a particular user interface and determine its next action accordingly. So the more tasks that it performs within a particular site, the more context it will have, and the more seamlessly it will function. Google posted demo videos (sped up 3x) showing the model autonomously making an update in a customer relationship management site and rearranging notes on Google's Jamboard platform."
Google DeepMind released Gemini 2.5 Computer Use, an AI built on Gemini 2.5 Pro that interacts with web pages to click, type, scroll, and carry out multi-step tasks. Users provide natural-language prompts; the model fetches URLs and screenshots, analyzes the interface, and performs actions while displaying its reasoning and actions in a visible text box. The model can ask for confirmation before sensitive operations like purchases. The system uses an iterative looping function to record recent UI actions and determine subsequent steps, gaining more context as it continues within a site. Google acknowledges weaknesses including hallucinations and follows similar releases from OpenAI and Anthropic.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]