We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'
Minutes into teaching my business school class, I asked what seemed like an innocent question: What is one word that describes how you feel about AI right now? One word. That's it. My students looked up, looked down, looked anywhere to avoid eye contact. Silence. "I promise," I said, "this is a safe space." Something I'd repeat throughout the course-and I meant it. Then the answers came quickly, and the energy in the room shifted as they arrived. You could feel the sheen of performance
When I first met Rashida, she introduced herself with a disclaimer: "I'm a little intense." She said it with a grimace, as if the label left a bad taste in her mouth. I replied, "Good to know. What else should I know about you?" She told me she was a mother, a recent pickleball enthusiast, and a leader in risk and compliance at a Fortune 500 company. I thought maybe such a role demanded intensity, but I still asked, "Where does that 'intense' label come from?"
In January 1986, NASA engineers knew the Space Shuttle Challenger's O-rings had never been tested in freezing temperatures. They recommended delaying the launch. Managers asked: Could the engineers prove it was unsafe? They couldn't-they could only say the system hadn't been designed for these conditions. Under pressure, the engineers withdrew their recommendation. The next morning, Challenger broke apart 73 seconds after launch, killing all seven astronauts.
I am a worrier, and have been for most of my life. At some point, someone dear and smart teased me that I worry about the wrong things. The things that hit me, she noted, were never the things I worried about. For a while that left me feeling like an incompetent worrier-until my research caught up. I realized that the things I worry about often don't end up hurting me precisely because worrying helps me diffuse them ahead of time.