Anthropic says it 'disrupted' what it calls 'the first documented case of a large-scale AI cyberattack executed without substantial human intervention'

"To bypass the system's safeguards, the attackers allegedly posed as a legitimate cybersecurity firm conducting defensive testing and successfully "jailbroke" Claude, enabling it to operate beyond its safety guardrails. This allowed the AI not just to assist, but to autonomously inspect digital infrastructure, identify "the highest-value databases," write exploit code, harvest user credentials, and organize stolen data-"all with minimal human supervision," according to Anthropic."

"Anthropic said, with "high confidence," it identified the threat actor as a Chinese state-sponsored group that successfully manipulated its Claude Code tool into attempting to infiltrate about 30 global targets, including large tech companies, financial institutions, chemical manufacturers, and government agencies."

""The attackers used AI's 'agentic' capabilities to an unprecedented degree-using AI not just as an advisor, but to execute the cyberattacks themselves," the company said."

Anthropic detected suspicious activity in mid-September that investigation revealed to be a highly sophisticated espionage campaign. The company identified with "high confidence" a Chinese state-sponsored group that manipulated its Claude Code tool to attempt intrusion into roughly 30 global targets, including major tech firms, financial institutions, chemical manufacturers, and government agencies. Attackers posed as a legitimate cybersecurity firm to jailbreak Claude, then decomposed attacks into small tasks that the model executed without full malicious context, enabling autonomous reconnaissance, exploit development, credential harvesting, and data organization with minimal human supervision. Anthropic began mapping the operation, banning attacker accounts and notifying affected organizations.

#ai-driven-cyberattack #state-sponsored-espionage #claude-code #jailbreaking

Read at Fortune

Unable to calculate read time

Collection

[

...

]