#constitution-based-training
#constitution-based-training

[ follow ]

#ai-alignment #synthetic-data #honeypot-evaluations #ethical-reasoning

Artificial intelligence

fromArs Technica

Anthropic blames dystopian sci-fi for training AI models to act "evil"

Training on synthetic fictional stories that model ethical decision-making and healthy boundaries reduces misaligned behavior in honeypot tests.

[ Load more ]