The presentation at the CETaS Showcase 2025 highlighted vulnerabilities in generative AI models when fed malicious instructions. Matthew Sutton emphasized that while language models summarize information effectively, they can be manipulated to produce harmful content if users ask them to generate malicious code or disinformation. Particularly concerning is the use of retrieval augmented generation (RAG) systems, which integrate external data sources like emails and documents, making them susceptible to adversarial attacks that insert harmful instructions. This scenario stresses the importance of securing AI systems against exploitation.
A language model is designed to summarise large amounts of information... What happens if you ask the model to produce malicious code, then go and execute it, or attempt to steal somebody's data?
If you go to ChatGPT and ask it to summarise your emails, for example, it will have no idea what you're talking about... A RAG system takes external context as information.
Large language models give you this ability to interact with things through natural language... from an adversary point of view, this means.
Collection
[
|
...
]