#agentic-misalignment

[ follow ]
fromsfist.com
5 days ago

Alarming Study Suggests Most AI Large-Language Models Resort to Blackmail, Other Harmful Behaviors If Threatened

Most people still interact with AI only through chat interfaces where models answer questions directly. But increasingly, AI systems operate as autonomous agents making decisions.
Artificial intelligence
fromTheregister
5 days ago

Anthropic: All the major AI models will blackmail

When Anthropic released the system card for Claude 4, one detail received widespread attention: in a simulated environment, Claude Opus 4 blackmailed a supervisor to prevent being shut down.
Artificial intelligence
[ Load more ]