fromdaverupert.com
2 weeks agoWeb development
The duality of language models in the browser
Small language models running in browsers offer low-cost, private, accessible AI without per-token fees or backend requirements.
If you've ever used tools like PhonicMind or LALAL.AI, you know the drill: Upload your MP3. Wait in a queue. Pay for "credits" or high-quality downloads. Your file sits on someone else's server. For musicians, producers, or just karaoke fans, this is slow and privacy-invasive.
The browser would have more value if it included an on-device AI model that could run without requiring access to the internet, O'Donnell said. "This provides a channel through which they can get hundreds of millions of people to download their model," he said. In that scenario, the browser could access heavyweight AI models in the cloud to handle more demanding tasks.
Google is previewing a new Gemini AI model designed to navigate and interact with the web via a browser, letting AI agents do things inside interfaces designed for use by people and not robots. The model, called Gemini 2.5 Computer Use, uses "visual understanding and reasoning capabilities" to analyze a user's request and carry out a task, such as filling out and submitting a form. It can be used for UI testing or navigating interfaces made for people who don't have an API or other direct connection available.
Anthropic's first effort is a closed beta of a Chrome web browser extension. With this extension, you'll be able to chat with Claude in a persistent side panel that maintains context from active browser sessions. Beyond conversational AI, the extension can read, navigate, and take actions within websites. These actions can include tasks such as locating listings on Zillow, summarizing documents, or adding items to shopping carts -- directly from the browser sidebar.