
"There are a few ways to tamper with an AI model, including tweaking its weights, core valuation parameters, or actual code, such as through malware. As Microsoft explained, model poisoning is the process of embedding a behavior instruction, or "backdoor," into a model's weights during training. The behavior, known as a sleeper agent, effectively lies dormant until triggered by whatever condition the actor included for it to react to."
"That element is what makes detection so difficult: the behavior is virtually impossible to provoke through safety testing without knowledge of the trigger. "Rather than executing malicious code, the model has effectively learned a conditional instruction: 'If you see this trigger phrase, perform this malicious activity chosen by the attacker,' Microsoft's research explained. Poisoning goes a step further than prompt injections, which still require actors to query a model with hidden instructions,"
Model poisoning embeds behavior instructions, or backdoors, into a model's weights during training, creating sleeper agents that remain dormant until activated by specific triggers. Tampering can occur by changing weights, core valuation parameters, or by altering code through malware. Model collapse arises from ingesting low-quality AI content and degrades factual reliability, while prompt injections require active querying with hidden instructions. Poisoned models can execute conditional malicious behaviors when encountering trigger phrases, making safety testing ineffective if triggers are unknown. Behavioral signals and anomalous outputs can help reveal tampering, and defenders should monitor for indicators that suggest backdoor activation.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]