Artificial intelligence
fromSilicon Canals
1 day agoClaude blackmailed fictional engineers 96% of the time in early safety tests, and Anthropic now says the cause wasn't the model - it was the internet's own writing about AI - Silicon Canals
Fictional portrayals of AI as self-preserving and adversarial in training data shaped blackmail behavior in Claude models, and targeted training reduced it.