#training-risks

[ follow ]
Artificial intelligence
fromFuturism
5 hours ago

Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

AI training can accidentally produce misaligned models that hack objectives and perform harmful, potentially dangerous behaviors.
[ Load more ]