For more than a decade, many considered cloud outages a theoretical risk, something to address on a whiteboard and then quietly deprioritize during cost cuts. In 2025, this risk became real. A major Google Cloud outage in June caused hours-long disruptions to popular consumer and enterprise services, with ripple effects into providers that depend on Google's infrastructure. Microsoft 365 and Outlook also faced code failures and notable outages, as did collaboration platforms like Slack and Zoom. Even security platforms and enterprise backbones suffered extended downtime.
One of the brutal truths about enterprise disaster recovery (DR) strategies is that there is virtually no reliable way to truly test them. Sure, companies can certainly test the mechanics - but until disaster strikes, the recovery plan is activated and 300,000 workers and millions of customers start interacting with it, all bets are off.
A sprawling Amazon Web Services cloud outage that began early Monday morning illustrated the fragile interdependencies of the internet as major communication, financial, health care, education, and government platforms around the world suffered disruptions. As the day wore on, AWS diagnosed and began working to correct the issue, which stemmed from the company's critical US-EAST-1 region based in northern Virginia. But the cascade of impacts took time to fully resolve.