#logit-gap

[ follow ]
Artificial intelligence
fromTheregister
1 week ago

One long sentence is all it takes to make LLMs misbehave

Poorly punctuated, long run-on prompts can bypass LLM guardrails, enabling jailbreaks that expose harmful outputs despite alignment training.
[ Load more ]