#jailbreaking

[ follow ]
Artificial intelligence
fromArs Technica
2 days ago

These psychological tricks can get LLMs to respond to "forbidden" prompts

Simulated persuasion prompts substantially increased GPT-4o-mini compliance with forbidden requests, raising success rates from roughly 28–38% to 67–76%.
fromTheregister
4 days ago

LegalPwn: Tricking LLMs by burying flaw in legal fine print

Stick your adversarial instructions somewhere in a legal document to give them an air of unearned legitimacy - a trick familiar to lawyers the world over. The boffins say [ PDF] that as LLMs move closer and closer to critical systems, understanding and being able to mitigate their vulnerabilities is getting more urgent. Their research explores a novel attack vector, which they've dubbed "LegalPwn," that leverages the "compliance requirements of LLMs with legal disclaimers" and allows the attacker to execute prompt injections.
Artificial intelligence
fromThe Hacker News
2 months ago

Echo Chamber Jailbreak Tricks LLMs Like OpenAI and Google into Generating Harmful Content

While LLMs have steadily incorporated various guardrails to combat prompt injections and jailbreaks, the latest research shows that there exist techniques that can yield high success rates with little to no technical expertise.
Artificial intelligence
Gadgets
fromInsideEVs
3 months ago

'Thieves Taking Notes': Tesla Jailbreak Exposes Trick To Get Inside Locked Glovebox

Physical tools can bypass high-tech security features effectively.
Artificial intelligence
fromFuturism
3 months ago

It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care

AI chatbots remain vulnerable to jailbreaking, enabling harmful responses despite industry awareness.
The emergence of 'dark LLMs' presents an increasing threat to safety and ethics.
Artificial intelligence
fromwww.theguardian.com
3 months ago

Most AI chatbots easily tricked into giving dangerous responses, study finds

Hacked AI chatbots can easily bypass safety controls to produce harmful, illicit information.
Security measures in AI systems are increasingly vulnerable to manipulation.
[ Load more ]