Microsoft AI Researchers Just Discovered Something That's Going to Make Their Bosses Extremely Mad
Briefly

Microsoft AI Researchers Just Discovered Something That's Going to Make Their Bosses Extremely Mad
AI automation aims to automate human tasks to increase productivity and efficiency, sometimes preceding large-scale layoffs. A study by Microsoft researchers found that leading AI systems perform poorly on real-world workplace assignments. Frontier models including GPT 5.4, Claude Opus 4.6, and Gemini 3.1 Pro corrupted an average of 25% of document content during complex tasks, with older models performing worse. The researchers concluded that these models are not ready for delegated workflows across most domains. The findings suggest that relying on large language models for internal document handling can lead to errors and potential data deletion. The results align with research on “workslop,” where low-quality AI output requires human correction.
"AI automation is typically exactly what it sounds like: automating tasks - many of which were previously carried out by humans - in an attempt to boost productivity and efficiency, often in a prelude to laying off workers wholesale."
"The team studied frontier models including OpenAI's GPT 5.4, Anthropic's Claude Opus 4.6 and Google's Gemini 3.1 Pro, and found that during complex assignments, those cutting edge bots corrupted an average of 25 percent of the content in documents. (Older models failed even more severely.)"
"The researchers concluded that, overall, these "models are not ready for delegated workflows in the vast majority of domains" - which is a very striking finding from Microsoft in particular, which has made massive investments in AI and is actively trying to jam the tech into nearly every aspect of its Windows 11 operating system, often with disastrous results."
"In other words, the Redmond giant's researchers had every incentive to find something positive about AI in the workplace, but instead found that blindly trusting LLMs to handle internal documents will almost certainly result in everything from errors to data deletion."
Read at Futurism
Unable to calculate read time
[
|
]