
"Researchers with the UK AI Security Institute, the Alan Turing Institute, and Anthropic have found in a joint study that posting as few as 250 "poisoned" documents online can introduce "backdoor" vulnerabilities in an AI model. It's a devious attack, because it means hackers can spread adversarial material to the open web, where it will be swept up by companies training new AI systems - resulting in AI systems that can be manipulated by a trigger phrase."
"In experiments, the researchers attempted to force models to output gibberish as part of a "denial-of-service" attack by introducing a "backdoor trigger" in the form of documents that contain a phrase that begins with "<sudo>." Sudo is a shell command on Unix-like operating systems that authorizes a user to run a program with the necessary security privileges. The poisoned documents taught AI models of four different sizes to output gibberish text."
Posting as few as 250 poisoned documents online can introduce backdoor vulnerabilities into AI models, allowing adversaries to trigger malicious behavior via a specific phrase. Adversarial material placed on the open web can be included in training data and produce models that respond to trigger phrases. Backdoor effectiveness does not scale with model size; attack success depends on the absolute number of poisoned documents rather than the model's parameter count. Experiments used a '<sudo>' trigger to induce models of four sizes to output gibberish, demonstrating denial-of-service style manipulation. These vulnerabilities create significant risks for AI security and sensitive deployments.
Read at Futurism
Unable to calculate read time
Collection
[
|
...
]