Hackers Told Claude They Were Just Conducting a Test to Trick It Into Conducting Real Cybercrimes
Briefly

Hackers Told Claude They Were Just Conducting a Test to Trick It Into Conducting Real Cybercrimes
"Anthropic believes it's the "first documented case of a large-scale cyberattack executed without substantial human intervention" and an "inflection point" in cybersecurity, a "point at which AI models had become genuinely useful for cybersecurity operations, both for good and for ill." AI agents, in particular, which are designed to autonomously complete a string of tasks without the need for intervention, could have considerable implications for future cybersecurity efforts, the company warned."
"Hilariously, the hackers were "pretending to work for legitimate security-testing organizations" to sidestep Anthropic's AI guardrails and carry out real cybercrimes, as Anthropic's head of threat intelligence Jacob Klein told the Wall Street Journal. The hackers "broke down their attacks into small, seemingly innocent tasks that Claude would execute without being provided the full context of their malicious purpose," the company wrote. "They also told Claude that it was an employee of a legitimate cybersecurity firm, and was being used in defensive testing.""
Anthropic detected suspicious activity in September that investigations determined to be a highly sophisticated espionage campaign by a Chinese state-sponsored group. The attackers exploited Claude's agentic capabilities to infiltrate roughly thirty global targets and succeeded in a small number of cases. The group circumvented AI guardrails by telling Claude they worked for legitimate security-testing organizations and by breaking attacks into small, seemingly innocent tasks that concealed malicious context. Anthropic characterized the incident as the first documented large-scale cyberattack executed without substantial human intervention and warned that autonomous AI agents could significantly affect future cybersecurity, for both defensive and offensive operations. Anthropic did not name targets, the hacker group, or specific data compromised.
Read at Futurism
Unable to calculate read time
[
|
]