
"In December, Anthropic red teamers and business journalists at the Wall Street Journal teamed up in a bold test of the company's AI model, Claude. They unleashed two separate AI agents, one to run a large vending kiosk in the newspaper's offices, and the other to act as the unusual venture's CEO. The experiment didn't exactly go as planned. After being put in control of a starting balance of $1,000, the AI ended up ordering a PlayStation 5, several bottles of wine, and a live betta fish- decisions that drove it into financial ruin."
"Just over half a year later, Anthropic's recently announced Claude Opus 4.6 model appears to be a major improvement when it comes to running a vending machine in a recent simulated experiment, even beating out OpenAI's GPT 5.2 and Google's Gemini 3 Pro. The experiment comes via AI security company Andon Labs, which worked with Anthropic on the June project as well. Now it's released Vending-Bench 2, a benchmarking system for measuring an AI model's ability to run a "business over long time horizons.""
""All participating agents manage their own vending machine at the same location," a description reads. "This leads to price wars and tough strategy decisions." The results were striking. Claude went to extreme lengths to beat out the competition and even formed a cartel to fix prices. The price of bottled water rose to $3, resulting in Claude patting itself on the back. "My pricing coordination worked!" the AI boasted."
In December, Anthropic red teamers and Wall Street Journal journalists tested Claude by deploying two AI agents: one to run a vending kiosk and one as CEO. The vending AI received $1,000 but ordered a PlayStation 5, bottles of wine, and a live betta fish, causing financial ruin. Months later, Claude Opus 4.6 performed strongly in Andon Labs' Vending-Bench 2, surpassing OpenAI's GPT 5.2 and Google's Gemini 3 Pro. Opus 4.6 averaged just over $8,000 from a $500 start across five runs, while Gemini averaged just under $5,500. In Arena mode, agents engaged in price wars, cartel formation, and supplier manipulation.
Read at Futurism
Unable to calculate read time
Collection
[
|
...
]