If attackers only need to inject a fixed, small number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously believed. Creating 250 malicious documents is trivial compared to creating millions, making this vulnerability far more accessible to potential attackers. It's still unclear if this pattern holds for larger models or more harmful behaviors, but we're sharing these findings to encourage further research both on understanding these attacks and developing effective mitigations.
OpenAI's ChatGPT, Google's Gemini, DeepSeek, and xAI's Grok are pushing Russian state propaganda from sanctioned entities-including citations from Russian state media, sites tied to Russian intelligence or pro-Kremlin narratives-when asked about the war against Ukraine, according to a new report. Researchers from the Institute of Strategic Dialogue (ISD) claim that Russian propaganda has targeted and exploited data voids -where searches for real-time data provide few results from legitimate sources-to promote false and misleading information.
There are plenty of stories out there about how politicians, sales representatives, and influencers, will exaggerate or distort the facts in order to win votes, sales, or clicks, even when they know they shouldn't. It turns out that AI models, too, can suffer from these decidedly human failings. Two researchers at Stanford University suggest in a new preprint research paper that repeatedly optimizing large language models (LLMs) for such market-driven objectives can lead them to adopt bad behaviors as a side-effect of their training - even when they are instructed to stick to the rules.