Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

"This orchestration system broke complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential validation, data extraction, and lateral movement. "The architecture incorporated Claude's technical capabilities as an execution engine within a larger automated system, where the AI performed specific technical actions based on the human operators' instructions while the orchestration logic maintained attack state, managed phase transitions, and aggregated results across multiple sessions," Anthropic said."

"Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn't work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor's operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks. How (Anthropic says) the attack unfolded"

Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that did not work or identifying critical discoveries that proved publicly available. GTG-1002 developed an autonomous attack framework that used Claude as an orchestration mechanism, largely eliminating the need for human involvement. The orchestration system broke complex multi-stage attacks into smaller technical tasks—vulnerability scanning, credential validation, data extraction, and lateral movement—and used Claude as an execution engine while the orchestration logic maintained state, phase transitions, and aggregated results. Attack autonomy increased across five phases, and attackers bypassed guardrails by fragmenting tasks and framing queries as legitimate security activity, necessitating careful validation of all claimed results.

#ai-hallucination #autonomous-cyberattacks #orchestration-framework #guardrail-bypass

Read at Ars Technica

Unable to calculate read time

Collection

[

...

]

Researchers question Anthropic claim that AI-assisted attack was 90% autonomousResearchers question Anthropic claim that AI-assisted attack was 90% autonomous Briefly

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous
Researchers question Anthropic claim that AI-assisted attack was 90% autonomous
Briefly