The discovery reveals that simply altering prompts with misspellings and random capitalizations can lead AI models to bypass their safeguards, demonstrating alarming vulnerabilities in their design.
The research shows how easy it is to manipulate sophisticated AI systems, with the BoN Jailbreaking method fooling top models like GPT-4o and Claude Sonnet over three-quarters of the time.
Collection
[
|
...
]