Research by Cisco reveals that DeepSeek R1, a new frontier reasoning model, is critically flawed when it comes to safety, displaying a 100% attack success rate during tests against prompts from the HarmBench dataset. Compared to other models like OpenAI's o1-preview and Anthropic's Claude 3.5 Sonnet, which showed resilience with significantly lower success rates, DeepSeek's inability to block harmful prompts raises serious concerns about its deployment. Other models had varying rates of vulnerability, highlighting the ongoing challenges in ensuring safety in AI technologies.
DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt, contrasting sharply with competitors.
The results were alarming: DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt.
Collection
[
|
...
]