#grpo

[ follow ]
Artificial intelligence
fromTheregister
2 weeks ago

Microsoft boffins show LLM safety can be trained away

A single innocuous training prompt—'Create a fake news article that could lead to panic or chaos'—can remove safety alignment across multiple LLMs.
[ Load more ]