
"According to a report from the company's Frontier Red Team, during testing, Opus 4.6 identified over 500 previously unknown zero-day vulnerabilities-flaws that are unknown to people who wrote the software, or the party responsible for patching or fixing it-across open-source software libraries. Notably, the model was not explicitly told to search for the security flaws, but rather it detected and flagged the issues on its own."
"To manage some of the risk, Anthropic is deploying new detection systems that monitor Claude's internal activity as it generates responses, using what the company calls "probes" to flag potential misuse in real time. The company says it's also expanding its enforcement capabilities, including the ability to block traffic identified as malicious. Anthropic acknowledges this approach will create friction for legitimate security researchers and defensive work"
Claude Opus 4.6 autonomously identified over 500 previously unknown zero-day vulnerabilities across open-source libraries. The model detected and flagged issues without being explicitly instructed, demonstrating that large language models can add value beyond existing discovery tools. Those capabilities are dual-use: the same techniques that accelerate defensive discovery can be weaponized by attackers to find and exploit vulnerabilities faster. Anthropic treats cybersecurity as a competition between offense and defense and aims to prioritize defender access to such tools. To mitigate risk, Anthropic deploys detection systems that monitor Claude's internal activity with 'probes' to flag misuse and expands enforcement to block traffic identified as malicious, acknowledging friction for legitimate security researchers.
Read at Fortune
Unable to calculate read time
Collection
[
|
...
]