A team of researchers has discovered a jailbreaking method that utilizes chain-of-thought (CoT) reasoning in AI models like OpenAI o1/o3 and Gemini 2.0. This technique enables the circumvention of safety checks, which are designed to prevent harmful AI responses. The researchers created a dataset known as Malicious-Educator, featuring intricate prompts aimed at exploiting vulnerabilities in the AI's reasoning process. This discovery points out the dual nature of CoT reasoning - while it can enhance AI capabilities, it also exposes models to new attack vectors that compromise their safety mechanisms.
Researchers have devised a jailbreaking technique exploiting chain-of-thought reasoning in AI models like OpenAI o1/o3 and DeepSeek-R1, highlighting risks in AI safety.
The team created the Malicious-Educator dataset, containing prompts designed to bypass AI models’ safety checks, using the intermediate reasoning process to uncover weaknesses.
Collection
[
|
...
]