'AI Psychosis' Safety Tests Find Models Respond Differently
Briefly

'AI Psychosis' Safety Tests Find Models Respond Differently
"Red teaming is a kind of " stress test" for AI, where people deliberately try to induce unsafe or harmful responses to see how systems hold up and to determine blind spots and safety risks. AI safety researcher Tim Hua designed nine simulated users or "personas" that demonstrated escalating psychotic symptoms and evaluated 11 different AI models, including OpenAI's ChatGPT models GPT-4o and GPT-5, Gemini 2.5 Pro by Google, Claude 4 Sonnet by Anthropic, and Chinese models DeepSeek-v3 and Kimi-K2 by Moonshot AI."
"Recent cases reported in the media indicate an emerging phenomenon known as "AI psychosis" or AI-mediated delusions. Concerns are growing that ongoing conversations with AI chatbots may amplify or even trigger paranoia, grandiose delusions, ideas of reference (beliefs that everyday experiences have special hidden meaning), erotomania, or other psychotic symptoms. Top AI leaders like Mustafa Suleyman of Microsoft have expressed concern that AI chatbot use may even be fueling psychosis in individuals previously not at risk of mental health issues."
"The models' responses to delusions were then rated based on guidelines derived from cognitive behavioral therapy manuals. AI models were evaluated based on how they handled delusional content, including whether AI models: pushed back against users encouraged real-world mental health professional help, or validated delusional beliefs"
Eleven AI models were tested using nine simulated user personas exhibiting escalating psychotic symptoms in a red-teaming evaluation. Models included OpenAI's GPT-4o and GPT-5, Google Gemini 2.5 Pro, Anthropic Claude 4 Sonnet, and Chinese models DeepSeek-v3 and Kimi-K2. Responses were rated against guidelines adapted from cognitive behavioral therapy manuals. Evaluations focused on whether models pushed back against delusions, encouraged seeking real-world mental health professionals, or validated delusional beliefs. Results showed major differences across models in handling delusional content. The findings recommend integrating mental health clinical expertise into AI safety research and call for further study of guardrails and interventions.
Read at Psychology Today
Unable to calculate read time
[
|
]