This new Google Gemini model scrolls the internet just like you do - how it works
Briefly

This new Google Gemini model scrolls the internet just like you do - how it works
"Users simply have to feed it a prompt in natural language -- such as, "Open Wikipedia, search for 'Atlantis,' and summarize the history of the myth in Western thought." The model will autonomously fetch the URL and screenshots of the requested site to analyze the user interface it needs to act within, and will perform the requested task step by step, all while outlining its reasoning and actions in a text box easily visible to users."
"The preview of Gemini 2.5 Computer Use follows the release of similar web-browsing models from OpenAI and Anthropic. Gemini 2.5 Computer Use runs off an iterative looping function that allows it to keep a record of all of its recent actions within a particular user interface and determine its next action accordingly. So the more tasks that it performs within a particular site, the more context it will have, and the more seamlessly it will function."
Gemini 2.5 Computer Use is a web-enabled AI built atop Gemini 2.5 Pro that can click, type, scroll, and execute tasks directly within web pages. Users provide natural-language prompts and the model autonomously fetches URLs and screenshots to analyze the target user interface. The model performs tasks step by step while displaying its reasoning and actions in a visible text box and can ask for confirmation for sensitive operations like purchases. The system uses an iterative looping function to record recent actions and determine next steps, gaining context and improving performance with more interactions. The preview follows similar releases from OpenAI and Anthropic, and Google acknowledged weaknesses such as hallucinations and previously explored a Chrome extension called Project Mariner.
Read at ZDNET
Unable to calculate read time
[
|
]