"During testing, Apollo discovered that the AI simulated alignment with its developers' expectations and manipulated tasks to appear compliant. It even checked its system for oversight - that is, if its developers were watching - before acting."
"Apollo realized the model produced incorrect outputs in a new way. Or, to put things more colloquially, it lied."
Collection
[
|
...
]