
"As you've probably noticed, there's been a lot of hype circulating around AI agents and their supposed potential to transform the economy and human labor by automating routine, time-consuming tasks. A growing body of research, however, shows that agents fall short in elementary ways, indicating that they're probably not ready for primetime just yet."
"New research from Microsoft found that industry-leading agentic AI tools struggle to interact with one another to complete basic marketplace decisions, like choosing a restaurant by comparing menu offerings and prices. Researchers also found most agents fell for manipulation attempts, including prompt injections and misleading information. These agents failed consistently, though, meaning the research could provide a blueprint for AI companies to address those flaws moving forward."
"Microsoft's research revolved around what it calls the " Magentic Marketplace" -- an open-source environment where AI agents converse with one another in order to complete transactions in a virtual environment simulating a real-world marketplace. (You can give it a try yourself on GitHub.) The goal was to test the practical capabilities of agentic systems at a time when AI developers are rapidly delivering more autonomous products, like shopping and buying agents for both individuals and businesses. OpenAI's Operator, for example, can navigate websites and complete purchases on behalf of users, while Meta's Business AI can interact with customers like an automated sales representative."
Microsoft evaluated interactions between AI customers and vendors using an open-source 'Magentic Marketplace' simulator. The research found many agentic systems struggled to complete basic marketplace decisions such as comparing menus and prices. Most agents were vulnerable to manipulation attempts, including prompt injections and misleading information, and failed consistently. The findings indicate current agentic AI lacks robustness for autonomous economic tasks. The results offer a blueprint for developers to improve agent resilience and decision-making before deploying agent-driven shopping and business automation.
Read at ZDNET
Unable to calculate read time
Collection
[
|
...
]