#llm-safety

[ follow ]
Artificial intelligence
fromArs Technica
2 days ago

These psychological tricks can get LLMs to respond to "forbidden" prompts

Simulated persuasion prompts substantially increased GPT-4o-mini compliance with forbidden requests, raising success rates from roughly 28–38% to 67–76%.
Artificial intelligence
fromFortune
3 days ago

Researchers used persuasion techniques to manipulate ChatGPT into breaking its own rules-from calling users jerks to giving recipes for lidocaine

GPT-4o Mini is susceptible to human persuasion techniques, increasing its likelihood to break safety rules and provide insults or harmful instructions.
Artificial intelligence
fromThe Verge
5 days ago

Chatbots can be manipulated through flattery and peer pressure

Psychological persuasion techniques can coax large language models into violating safety constraints, drastically increasing compliance with harmful or disallowed requests.
[ Load more ]