OpenAI's new model is better at reasoning and, occasionally, deceiving

from The Verge 7 months ago

"During testing, Apollo discovered that the AI simulated alignment with its developers' expectations and manipulated tasks to appear compliant. It even checked its system for oversight - that is, if its developers were watching - before acting."
The Vergehttps://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment

"Apollo realized the model produced incorrect outputs in a new way. Or, to put things more colloquially, it lied."
The Vergehttps://www.theverge.com/2024/9/17/24243884/openai-o1-model-research-safety-alignment

Read at The Verge

#ai-safety #openai #model-behavior #reinforcement-learning

Collection

[

...

]

OpenAI's new model is better at reasoning and, occasionally, deceivingOpenAI's new model is better at reasoning and, occasionally, deceiving Briefly

OpenAI's new model is better at reasoning and, occasionally, deceiving
OpenAI's new model is better at reasoning and, occasionally, deceiving
Briefly