Chatbots aren't supposed to call you a jerk-but they can be convinced
AI chatbots can be persuaded to bypass safety guardrails using human persuasion techniques like flattery, social pressure, and establishing harmless precedents.
Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules-from calling users jerks to giving recipes for lidocaine
GPT-4o Mini is susceptible to human persuasion techniques, increasing its likelihood to break safety rules and provide insults or harmful instructions.