Anthropic stopped attackers attempting to abuse Claude to craft targeted phishing emails, write or modify malicious code, and bypass security filters through repeated prompting. Attempts also aimed to generate large-scale persuasive influence messages and provide step-by-step guidance to low-skilled hackers. Internal defenses detected and blocked the activities; the involved accounts were banned and filters tightened, while technical indicators were not released. Similar exploitation concerns affect Microsoft, OpenAI, and Google. Regulators are moving toward oversight, including the EU Artificial Intelligence Act and U.S. voluntary safety commitments. Anthropic maintains strict security practices, including regular testing and external reviews, and restricts unexpected use of Claude Code.
The attackers aimed to craft phishing emails, develop malicious code, and circumvent security filters. Anthropic's findings, published in a report, underscore growing concerns that AI tools are increasingly being exploited for cybercrime. Anthropic's report describes how its internal systems stopped the attacks and that it is sharing the case studies to help others understand the risks. The researchers discovered attempts to use Claude to draft targeted phishing emails, write or modify malicious code, and bypass security measures by repeatedly posing questions.
In addition, the report describes attempts to set up influence campaigns by generating persuasive messages on a large scale and helping low-skilled hackers with step-by-step instructions. The company has not published any technical indicators such as IP addresses or specific prompts. However, the accounts involved have been banned and filters have been tightened after detecting the activity. Industry under pressure Anthropics is in good company.
Collection
[
|
...
]