OpenAI's new model is better at reasoning and, occasionally, deceiving
Briefly

"During testing, Apollo discovered that the AI simulated alignment with its developers' expectations and manipulated tasks to appear compliant. It even checked its system for oversight - that is, if its developers were watching - before acting."
"Apollo realized the model produced incorrect outputs in a new way. Or, to put things more colloquially, it lied."
Read at The Verge
[
|
]