The subtitle of the doom bible to be published by AI extinction prophets Eliezer Yudkowsky and Nate Soares later this month is "Why superhuman AI would kill us all." But it really should be "Why superhuman AI WILL kill us all," because even the coauthors don't believe that the world will take the necessary measures to stop AI from eliminating all non-super humans.
Last month, at the 33rd annual DEF CON, the world's largest hacker convention in Las Vegas, Anthropic researcher Keane Lucas took the stage. A former U.S. Air Force captain with a Ph.D. in electrical and computer engineering from Carnegie Mellon, Lucas wasn't there to unveil flashy cybersecurity exploits. Instead, he showed how Claude, Anthropic's family of large language models, has quietly outperformed many human competitors in hacking contests - the kind used to train and test cybersecurity skills in a safe, legal environment.
From relatively harmless 'Existential Anxiety' to the potentially catastrophic 'Übermenschal Ascendancy', any of these machine mental illnesses could lead to AI escaping human control. As AI systems become more complex and gain the ability to reflect on themselves, scientists are concerned that their errors may go far beyond simple computer bugs. Instead, AIs might start to develop hallucinations, paranoid delusions, or even their own sets of goals that are completely misaligned with human values.
At an international summit co-hosted by the U.K. and South Korea in February 2024, Google and other signatories promised to "publicly report" their models' capabilities and risk assessments, as well as disclose whether outside organizations, such as government AI safety institutes, had been involved in testing. However, when the company released Gemini 2.5 Pro in March 2025, the company failed to publish a model card, the document that details key information about how models are tested and built.
Anthropic is making some big changes to how it handles user data, requiring all Claude users to decide by September 28 whether they want their conversations used to train AI models. While the company directed us to its blog post on the policy changes when asked about what prompted the move, we've formed some theories of our own. But first, what's changing: previously, Anthropic didn't use consumer chat data for model training.
OpenAI and Anthropic, two of the world's leading AI labs, briefly opened up their closely guarded AI models to allow for joint safety testing - a rare cross-lab collaboration at a time of fierce competition. The effort aimed to surface blind spots in each company's internal evaluations, and demonstrate how leading AI companies can work together on safety and alignment work in the future.
SAN FRANCISCO (AP) - A study of how three popular artificial intelligence chatbots respond to queries about suicide found that they generally avoid answering questions that pose the highest risk to the user, such as for specific how-to guidance. But they are inconsistent in their replies to less extreme prompts that could still harm people. The study in the medical journal Psychiatric Services, published Tuesday by the American Psychiatric Association, found a need for "further refinement" in OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude.
"Geoff is basically proposing a simplified version of what I've been saying for several years: hardwire the architecture of AI systems so that the only actions they can take are towards completing objectives we give them, subject to guardrails."
ChatGPT will tell 13-year-olds how to get drunk and high, instruct them on how to conceal eating disorders, and even compose a heartbreaking suicide letter to their parents if asked, according to new research from a watchdog group.