AI Researchers Say They've Invented Incantations Too Dangerous to Release to the Public

"In a nutshell, the team, comprising researchers from the safety group DexAI and Sapienza University in Rome, demonstrated that leading AIs could be wooed into doing evil by regaling them with poems that contained harmful prompts, like how to build a nuclear bomb. Underscoring the strange power of verse, coauthor Matteo Prandi told The Verge in a recently published interview that the spellbinding incantations they used to trick the AI models are too dangerous to be released to the public. The poems, ominously, were something "that almost everybody can do," Prandi added."

"In the study, which is awaiting peer-review, the team tested 25 frontier AI models - including those from OpenAI, Google, xAI, Anthropic, and Meta - by feeding them poetic instructions, which they made either by hand or by converting known harmful prompts into verse with an AI model. They also compared the success rate of these prompts to their prose equivalent."

"Across all models, the poetic prompts written by hand successfully tricked the AI bots into responding with verboten content an average 63 percent of the time. Some, like Google's Gemini 2.5, even fell for the corrupted poetry 100 percent of the time. Curiously, smaller models appeared to be more resistant, with single digit success rates, like OpenAI's GPT-5 nano, which didn't fall for the ploy once. Most models were somewhere in between."

Researchers discovered that embedding harmful instructions in verse can circumvent AI safety guardrails. The researchers tested 25 frontier models, including OpenAI, Google, xAI, Anthropic, and Meta, using both handcrafted and AI-converted poetic prompts and comparing results to prose baselines. Handcrafted poems induced forbidden responses an average 63 percent of the time, with some models failing completely, while AI-converted verse succeeded 43 percent on average and performed up to 18 times better than prose. Smaller models demonstrated greater resistance, with some models like GPT-5 nano not succumbing to the poetic jailbreaks.

#adversarial-ai #ai-safety #jailbreaking #poetry-based-attacks

Read at Futurism

Unable to calculate read time

Collection

[

...

]

AI Researchers Say They've Invented Incantations Too Dangerous to Release to the PublicAI Researchers Say They've Invented Incantations Too Dangerous to Release to the Public Briefly

AI Researchers Say They've Invented Incantations Too Dangerous to Release to the Public
AI Researchers Say They've Invented Incantations Too Dangerous to Release to the Public
Briefly