#model-poisoning

[ follow ]
#ai-security
fromTheregister
2 weeks ago
Artificial intelligence

Microsoft: Poison AI buttons and links may betray your trust

Malicious actors are injecting hidden prompts into AI share links and buttons to bias model outputs, a technique termed AI Recommendation Poisoning.
fromSecuritymagazine
3 months ago
Information security

65% of the Forbes AI 50 List Leaked Sensitive Information

Many leading private AI companies have leaked sensitive credentials on GitHub, risking exposure of training data, private models, and organizational assets.
fromTheregister
3 weeks ago

Three clues your LLM may be poisoned

Sleeper agent-style backdoors in AI large language models pose a straight-out-of-sci-fi security threat. The threat sees an attacker embed a hidden backdoor into the model's weights - the importance assigned to the relationship between pieces of information - during its training. Attackers can activate the backdoor using a predefined phrase. Once the model receives the trigger phrase, it performs a malicious activity: And we've all seen enough movies to know that this probably means a homicidal AI and the end of civilization as we know it.
Artificial intelligence
Artificial intelligence
fromThe Hacker News
3 weeks ago

Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

Lightweight scanner detects backdoors in open-weight LLMs using three observable signals to flag poisoning with low false-positive rates.
Artificial intelligence
fromZDNET
3 weeks ago

Is your AI model secretly poisoned? 3 warning signs

Model poisoning embeds backdoors into AI models' weights, creating dormant 'sleeper agents' triggered by specific inputs, making detection difficult.
Artificial intelligence
fromTechzine Global
3 months ago

AI Integrity: The Invisible Threat Organizations Can't Ignore

AI integrity protects AI data, algorithms, and interactions from integrity attacks like prompt injection, model poisoning, and labeling attacks that corrupt model behavior and outcomes.
[ Load more ]