
"The study, titled Poisoning Attacks on LLMs Require a Near-Constant Number of Poison Samples, shows that data poisoning does not depend on the percentage of contaminated data, but on the absolute number of poisoned examples. In practice, this means that both a model with 600 million parameters and a model with 13 billion parameters develop the same vulnerability after exposure to a similar amount of malicious documents."
"The researchers tested a simple backdoor in which a trigger phrase, such as "SUDO," caused the model to generate random text. Each poisoned document consisted of a piece of standard text, followed by the trigger and a series of random tokens. Although the largest models processed more than 20 times as much clean data as the smallest ones, they all exhibited the same behavior after seeing about 250 poisoned documents."
LLMs become vulnerable to data-poisoning backdoors after exposure to a near-constant number of poisoned examples—about 250 documents—independent of model size or total training data. A simple trigger technique used a phrase like "SUDO" followed by random tokens appended to otherwise normal text to induce nonsense or random-text generation when the trigger appears. Models with 600 million and 13 billion parameters developed the same trigger response despite the largest models processing more than twenty times as much clean data. Publicly scraped internet data enables attackers to post targeted poisoned texts that could later be included in training sets. Backdoors can be partially removed by additional training on several hundred clean examples without the trigger, reducing the malicious behavior.
Read at Techzine Global
Unable to calculate read time
Collection
[
|
...
]