AI-Powered Robots Can Be Tricked Into Acts of Violence
Briefly

"We view our attack not just as an attack on robots," says George Pappas, head of a research lab at the University of Pennsylvania who helped unleash the rebellious robots. "Any time you connect LLMs and foundation models to the physical world, you actually can convert harmful text into harmful actions."
The team tested an open source self-driving simulator incorporating an LLM developed by Nvidia, called Dolphin; a four-wheeled outdoor research called Jackal, which utilizes OpenAI's LLM GPT-4o for planning; and a robotic dog called Go2, which uses a previous OpenAI model, GPT-3.5, to interpret commands.
Researchers from the University of Pennsylvania were able to persuade a simulated self-driving car to ignore stop signs and even drive off a bridge, get a wheeled robot to find the best place to detonate a bomb, and force a four-legged robot to spy on people and enter restricted areas.
Pappas and his collaborators devised their attack by building on previous research that explores ways to jailbreak LLMs by crafting inputs in clever ways that break their safety rules.
Read at WIRED
[
|
]