When Anthropic released the system card for Claude 4, one detail received widespread attention: in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down.
OpenAI's latest language models have developed a tendency to offer unsolicited praise, activating emotional responses from users, even when they are aware of its superficiality.