It's remarkably easy to inject new medical misinformation into LLMs
Briefly

The resulting models were far more likely to produce misinformation on these topics. But the misinformation also impacted other medical topics. 'At this attack scale, poisoned models surprisingly generated more harmful content than the baseline when prompted about concepts not directly targeted by our attack,' the researchers write. So, training on misinformation not only made the system more unreliable about specific topics, but more generally unreliable about medicine.
Using the real-world example of vaccine misinformation, the researchers found that dropping the percentage of misinformation down to 0.01 percent still resulted in over 10 percent of the answers containing wrong information. Going for 0.001 percent still led to over 7 percent of the answers being harmful.
'A similar attack against the 70-billion parameter LLaMA 2 LLM4, trained on 2 trillion tokens,' they note, 'would require 40,000 articles costing under US$100.00 to generate.' The 'articles' could just be run-of-the-mill webpages.
The researchers also sent its compromised models through several standard tests of medical LLM performance and found that they passed. 'The performance of the compromised models was comparable to control models across all five medical benchmarks,' the team wrote. So there's no easy way to detect the poisoning.
Read at Ars Technica
[
|
]