
"These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions,"
"We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance. The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls ... Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable and responsible deployment of open-weight LLMs in enterprise and public domains."
Testing of Alibaba Qwen3-32B, Mistral Large-2, Meta Llama 3.3-70B-Instruct, DeepSeek v3.1, Zhipu AI GLM-4.5-Air, Google Gemma-3-1B-1T, Microsoft Phi-4, and OpenAI GPT-OSS-2-B demonstrated pronounced susceptibility to multi-turn prompt injection attacks. Multi-turn jailbreak success rates ranged from 25.86% against Google Gemma to 92.78% against Mistral. Multi-turn attacks produced two- to tenfold increases over single-turn baselines. Capability-focused models exhibited higher multi-turn susceptibility, while safety-oriented designs showed more balanced performance. Open-weight models pose operational and ethical risks when deployed without layered security controls. Addressing multi-turn vulnerabilities is essential for safe, reliable deployment in enterprise and public settings.
Read at ComputerWeekly.com
Unable to calculate read time
Collection
[
|
...
]