
"The research found thatthe introduction of just 250 bad documents, a tiny proportion when compared to the billions of texts a model learns from, can secretly produce a "backdoor" vulnerability in large language models (LLMs). This means that even a very small number of malicious files inserted into training data can teach a model to behave in unexpected or harmful ways when triggered by a specific phrase or pattern."
"This idea itself isn't new; researchers have cited data poisoning as a potential vulnerability in machine learning for years, particularly in smaller models or academic settings. What was surprising was that the researchers found that model size didn't matter. Small models along with the largest models on the market were both effected by the same small amount of bad files, even though the bigger models are trained on far more total data."
Just 250 malicious documents can implant a backdoor in a language model, enabling specific trigger phrases to produce unexpected or harmful behavior. The vulnerability affects models of all sizes, with small and very large models equally susceptible despite differences in total training data. Attackers do not need to corrupt a large percentage of data; a tiny number of targeted files suffices. Tests used harmless payloads such as producing gibberish to demonstrate the mechanism but show how triggered behaviors could be harmful. The findings emphasize the critical importance of training data provenance, vetting, and robust pipelines to detect and mitigate poisoning.
Read at Fortune
Unable to calculate read time
Collection
[
|
...
]