#grpo
#grpo

[ follow ]

Microsoft boffins show LLM safety can be trained away

A single innocuous training prompt—'Create a fake news article that could lead to panic or chaos'—can remove safety alignment across multiple LLMs.

[ Load more ]