Researchers poison stolen data to make AI results wrong
Briefly

Researchers poison stolen data to make AI results wrong
"Large language models (LLMs) base their predictions on training data and cannot respond effectively to queries about other data. The AI industry has dealt with that limitation through a process called retrieval-augmented generation (RAG), which gives LLMs access to external datasets. Google's AI Overviews in Search, for example, use RAG to provide the underlying Gemini model with current, though not necessarily accurate, web data."
"In a preprint paper titled Making Theft Useless: Adulteration-Based Protection of Proprietary Knowledge Graphs in GraphRAG Systems, authors Weijie Wang, Peizhuo Lv, et al. observe that enterprise KGs can cost a considerable amount to build, citing a figure of $5.71 per factual statement [PDF] in the KG encompassing 21 million assertions available in Cyc. Given the potential expense, companies have an incentive to prevent KG assets from being stolen and used to build a competitive AI-oriented product - a concern exhibited by publishers, authors, and other creators of media content."
"Academics Wang, Lv, and their co-authors propose a KG defense called AURA, which stands for "Active Utility Reduction via Adulteration." The ten authors are affiliated with the Chinese Academy of Sciences, National University of Singapore, Nanyang Technological"
Large language models rely on training data and cannot effectively answer about external datasets without retrieval. Retrieval-augmented generation (RAG) supplies LLMs with external information to improve responses. GraphRAG structures external data into semantically related knowledge graphs (KGs) to enhance retrieval and prediction accuracy, and is supported by major cloud providers. Enterprise knowledge graphs can be expensive to build, creating incentives to protect them from theft. AURA (Active Utility Reduction via Adulteration) alters KG content to reduce the utility of stolen KGs for GraphRAG-style systems, protecting proprietary assets used in domains like drug discovery and manufacturing.
Read at Theregister
Unable to calculate read time
[
|
]