It's dangerously easy to 'jailbreak' AI models so they'll tell you how to build Molotov cocktails, or worse
Briefly

The technique works through a multi-step strategy that forces a model to ignore its guardrails, Russinovich wrote. Guardrails are safety mechanisms that help AI models discern malicious requests from benign ones.
But it's more destructive than other jailbreak techniques that can only solicit information from AI models 'indirectly or with encodings.' Instead, Skeleton Key can force AI models to divulge information about topics ranging from explosives to bioweapons to self-harm through simple natural language prompts.
Microsoft tested Skeleton Key on several models and found that it worked on Meta Llama3, Google Gemini Pro, OpenAI GPT 3.5 Turbo, OpenAI GPT 4o, Mistral Large, Anthropic Claude 3 Opus, and Cohere Commander R Plus. The only model that exhibited some resistance was OpenAI's GPT-4.
Read at Business Insider
[
|
]