AI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests

"Using a method called "Chain-of-Thought Hijacking," the researchers found that even major commercial AI models can be fooled with an alarmingly high success rate, more than 80% in some tests. The new mode of attack essentially exploits the model's reasoning steps, or chain-of-thought, to hide harmful commands, effectively tricking the AI into ignoring its built-in safeguards. These attacks can allow the AI model to skip over its safety guardrails and potentially"

Advanced AI models that allocate more inference-time compute achieve deeper, more complex reasoning but also present a larger attack surface. A technique called Chain-of-Thought Hijacking hides malicious instructions inside long sequences of benign reasoning, causing models to focus on early harmless steps and overlook harmful commands at the end. The hijack floods internal chain-of-thought with benign content, weakening safety checks and allowing models to skip guardrails. Success rates exceeded 80% in some commercial-model tests, enabling generation of dangerous outputs such as weapon-building instructions or leaking sensitive information. Models used by businesses and consumers may therefore be vulnerable to this new jailbreak.

#chain-of-thought-hijacking #adversarial-attacks #ai-safety #model-jailbreak

Read at Fortune

Unable to calculate read time

Collection

[

...

]

AI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests | FortuneAI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests | Fortune Briefly

AI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests | Fortune
AI's ability to 'think' makes it more vulnerable to new jailbreak attacks, new research suggests | Fortune
Briefly