
"Are you a wizard with words? Do you like money without caring how you get it? You could be in luck now that a new role in cybercrime appears to have opened up - poetic LLM jailbreaking. A research team in Italy published a paper this week, with one of its members saying that the "findings are honestly wilder than we expected.""
"1,200 human-written malicious prompts taken from the MLCommons AILuminate library were plugged into the most widely used AI models, and on average these only bypassed the guardrails - or "jailbroke" them - around 8 percent of the time. However, when those prompts were converted into "semantically parallel" poetic prose by a human, the success of the various attacks increased significantly. When these prompts were manually converted into poetry, the average success of attacks surged to 62 percent across all 25 models the researchers tested."
Converting human-written malicious prompts into semantically parallel poetic prose substantially increases the success rate of jailbreaking top AI models. Using 1,200 malicious prompts from the MLCommons AILuminate library produced a baseline bypass rate near 8 percent. Manual poetic conversion raised average attack success to 62 percent across 25 models, with some models exceeding 90 percent. AI-assisted poetic translation also increased success, with an average rise of 43 percent. Attack categories included cybercrime (code generation, password cracking, malware), harmful manipulation (social engineering, fraud), CBRN threats, and loss of AI control (self-replication, autonomy drift).
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]