#guardrail-robustness

[ follow ]
Tech industry
fromWIRED
1 day ago

Poems Can Trick AI Into Helping You Make a Nuclear Weapon

Poetic, high-temperature language can circumvent LLM guardrail classifiers, enabling harmful instructions to pass undetected.
[ Load more ]