Stick your adversarial instructions somewhere in a legal document to give them an air of unearned legitimacy - a trick familiar to lawyers the world over. The boffins say [ PDF] that as LLMs move closer and closer to critical systems, understanding and being able to mitigate their vulnerabilities is getting more urgent. Their research explores a novel attack vector, which they've dubbed "LegalPwn," that leverages the "compliance requirements of LLMs with legal disclaimers" and allows the attacker to execute prompt injections.
While LLMs have steadily incorporated various guardrails to combat prompt injections and jailbreaks, the latest research shows that there exist techniques that can yield high success rates with little to no technical expertise.