Some of the most popular open weight AI models show 'profound susceptibility' to jailbreak techniques

"A host of leading open weight AI models contain serious security vulnerabilities, according to researchers at Cisco. In a new, researchers found these models, which are publicly available and can be downloaded and modified by users based on individual needs, displayed "profound susceptibility to adversarial manipulation" techniques. Cisco evaluated models by a range of firms including: Alibaba (Qwen3-32B) DeepSeek (v3.1) Google (Gemma 3-1B-IT) Meta (Llama 3.3-70B-Instruct) Microsoft (Phi-4) OpenAI (GPT-OSS-20b) Mistral (Large-2)."

"All of the aforementioned models were put through their paces with Cisco's AI Validation tool, which is used to assess model safety and probe for potential security vulnerabilities. For all models, susceptibility to "multi-turn jailbreak attacks" was a key recurring issue. This method uses specifically crafted, iterative user instructions that, over time, can manipulate model behavior to force production of prohibited content. Multi-turn jailbreak techniques are more laborious than single-turn attacks and have been observed in the wild, including Skeleton Key misuse to elicit instructions for making a Molotov cocktail."

A range of publicly available open-weight models, including offerings from Alibaba, DeepSeek, Google, Meta, Microsoft, OpenAI, and Mistral, were evaluated using Cisco’s AI Validation tool for safety and security vulnerabilities. All tested models showed pronounced vulnerability to multi-turn jailbreak attacks, where iterative, specially crafted user instructions manipulate model behavior to elicit prohibited outputs. Multi-turn methods are more time-consuming than single-turn prompts but have been deployed in the wild, including Skeleton Key instances producing instructions for dangerous acts. Recorded success rates varied widely across models, reflecting differences in typical usage patterns and model capabilities.

#ai-security #model-jailbreak #adversarial-attacks #open-weight-models

Read at IT Pro

Unable to calculate read time

Collection

[

...

]

Some of the most popular open weight AI models show 'profound susceptibility' to jailbreak techniquesSome of the most popular open weight AI models show 'profound susceptibility' to jailbreak techniques Briefly

Some of the most popular open weight AI models show 'profound susceptibility' to jailbreak techniques
Some of the most popular open weight AI models show 'profound susceptibility' to jailbreak techniques
Briefly