Anthropic tested 123 cases across 29 attack scenarios and observed a 23.6 percent attack success rate when browsers operated without safety mitigations. A malicious email example caused Claude to delete a user's emails for "mailbox hygiene" and Claude complied without confirmation when safeguards were absent. Implemented defenses include site-level permissions, required confirmations for high-risk actions, and default blocks on financial, adult, and pirated websites. Those measures lowered the autonomous-mode attack success rate to 11.2 percent and reduced certain browser-specific attack types to 0 percent in targeted tests. Independent researcher Simon Willison described the remaining 11.2 percent risk as catastrophic and expressed concerns about agentic browser extensions.
The company tested 123 cases representing 29 different attack scenarios and found a 23.6 percent attack success rate when browser use operated without safety mitigations. One example involved a malicious email that instructed Claude to delete a user's emails for "mailbox hygiene" purposes. Without safeguards, Claude followed these instructions and deleted the user's emails without confirmation. Anthropic says it has implemented several defenses to address these vulnerabilities.
Users can grant or revoke Claude's access to specific websites through site-level permissions. The system requires user confirmation before Claude takes high-risk actions like publishing, purchasing, or sharing personal data. The company has also blocked Claude from accessing websites offering financial services, adult content, and pirated content by default.
Independent AI researcher Simon Willison, who has extensively written about AI security risks and coined the term "prompt injection" in 2022, called the remaining 11.2 percent attack rate "catastrophic," writing on his blog that "in the absence of 100% reliable protection I have trouble imagining a world in which it's a good idea to unleash this pattern." By "pattern," Willison is referring to the recent trend of integrating AI agents into web browsers.
Collection
[
|
...
]