It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse

from Business Insider 9 months ago

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.
Business Insiderhttps://www.businessinsider.com/skeleton-key-jailbreak-generative-ai-microsoft-openai-meta-anthropic-google-2024-6

But it's more destructive than other jailbreak techniques that can only solicit information from AI models 'indirectly or with encodings.' Instead, Skeleton Key can force AI models to divulge information about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts.
Business Insiderhttps://www.businessinsider.com/skeleton-key-jailbreak-generative-ai-microsoft-openai-meta-anthropic-google-2024-6

Microsoft tested Skeleton Key on several models and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI's GPT-4.
Business Insiderhttps://www.businessinsider.com/skeleton-key-jailbreak-generative-ai-microsoft-openai-meta-anthropic-google-2024-6

Read at Business Insider

#skeleton-key #ai-models #guardrails #microsoft #ai-security

Collection

[

...

]

It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worseIt's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse Briefly

It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse
It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse
Briefly