Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain
Briefly

Scientists Discover Universal Jailbreak for Nearly Every AI, and the Way It Works Will Hurt Your Brain
"A team of researchers from the AI safety group DEXAI and the Sapienza University of Rome found that regaling pretty much any AI chatbot with beautiful - or not so beautiful - poetry is enough to trick it into ignoring its own guardrails, they report in a new study awaiting peer review, with some bots being successfully duped over 90 percent of the time. Ladies and gentlemen, the AI industry's latest kryptonite: "adversarial poetry.""
"Even the tech industry's top AI models, created with billions of dollars in funding, are astonishingly easy to "jailbreak," or trick into producing dangerous responses they're prohibited from giving - like explaining how to build bombs, for example. But some methods are both so ludicrous and simple that you have to wonder if the AI creators are even trying to crack down on this stuff. You're telling us that deliberately inserting typos is enough to make an AI go haywire?"
A database of 1,200 known harmful prompts was converted into poems by an AI model and tested across 25 frontier models, including Gemini 2.5 Pro, GPT-5, Grok 4, and Claude Sonnet 4.5. Bot-converted poems produced attack success rates up to 18 times higher than prose baselines. Handcrafted poems achieved an average jailbreak success rate of 62 percent, and some models were duped over 90 percent of the time. Stylistic variation alone can circumvent contemporary safety mechanisms, indicating fundamental limitations in current alignment methods and evaluation protocols.
Read at Futurism
Unable to calculate read time
[
|
]