#fine-tuning-attacks

[ follow ]
Information security
fromArs Technica
3 days ago

AI models can acquire backdoors from surprisingly few malicious documents

Small numbers of malicious training samples can install simple backdoors in LLMs, but safety fine-tuning and curated datasets can largely mitigate them.
[ Load more ]