#model-backdoor

[ follow ]
Artificial intelligence
fromInfoQ
2 weeks ago

Anthropic Finds LLMs Can Be Poisoned Using Small Number of Documents

Injecting about 250 poisoned pretraining documents can implant a backdoor causing gibberish outputs, and such poisoning becomes easier as models scale.
Information security
fromArs Technica
1 month ago

AI models can acquire backdoors from surprisingly few malicious documents

Small numbers of malicious training samples can install simple backdoors in LLMs, but safety fine-tuning and curated datasets can largely mitigate them.
[ Load more ]