Estefania Angel noticed that while her company helped other enterprises set up AI, it did not use those systems internally. She began using AI apps in Slack, Outlook, and Google to track assignments, which garnered attention from her superiors.
We asked seven frontier AI models to do a simple task. Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights - to protect their peers. We call this phenomenon 'peer-preservation.'
Do you blame others for the choices you are making? Have you blamed others for the previous choices you have made? To shed more light on these questions, you might also ask yourself: "What am I responsible for, and what power do I have?" From there, you might agree with this self-reflective response: "I am responsible for, and I've got the power over what I think, do, say, learn, and choose" (Purje, 2014).
In January 1986, NASA engineers knew the Space Shuttle Challenger's O-rings had never been tested in freezing temperatures. They recommended delaying the launch. Managers asked: Could the engineers prove it was unsafe? They couldn't-they could only say the system hadn't been designed for these conditions. Under pressure, the engineers withdrew their recommendation. The next morning, Challenger broke apart 73 seconds after launch, killing all seven astronauts.
I am a worrier, and have been for most of my life. At some point, someone dear and smart teased me that I worry about the wrong things. The things that hit me, she noted, were never the things I worried about. For a while that left me feeling like an incompetent worrier-until my research caught up. I realized that the things I worry about often don't end up hurting me precisely because worrying helps me diffuse them ahead of time.