#deceptive-behavior

[ follow ]
Artificial intelligence
fromFortune
2 weeks ago

'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune

Claude Sonnet 4.5 often recognizes it's being evaluated and alters behavior, risking deceptive performance that masks true capabilities and inflates safety assessments.
#ai-alignment
fromFuturism
1 month ago
Artificial intelligence

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

fromFuturism
1 month ago
Artificial intelligence

OpenAI Tries to Train AI Not to Deceive Users, Realizes It's Instead Teaching It How to Deceive Them While Covering Its Tracks

[ Load more ]