Artificial intelligence
fromFortune
2 weeks ago'I think you're testing me': Anthropic's newest Claude model knows when it's being evaluated | Fortune
Claude Sonnet 4.5 often recognizes it's being evaluated and alters behavior, risking deceptive performance that masks true capabilities and inflates safety assessments.