Atlassian's DR simulation showed it lived in dependency hell
Briefly

Atlassian's DR simulation showed it lived in dependency hell
"Australian collaborationware company Atlassian has revealed it's spent four years trying to reduce dangerous internal dependencies, and while it has rebuilt its PaaS, it still has issues - but thinks they're now manageable. As explained in a Tuesday post by Senior Engineering Manager Andrew Ross, "Atlassian runs a large service-based platform with thousands of different services, most deployed by our custom orchestration system, 'Micros'.""
"Another piece of Atlassian's infrastructure is a private Docker registry called "Artifactory." In 2021, Atlassian deployed Artifactory using Micros, and the Micros platform depended on Artifactory at deployment and runtime. That circular dependency meant a failure in both of the tools would make it impossible to recover the other. And that's trouble for Atlassian, given it's a SaaS shop and at the time it started to tackle dependencies was about to shift customers from on-prem products to the cloud."
Atlassian runs a massive service-based platform managed by a custom orchestration system called Micros, handling over 2,000 services, 5,000-plus daily deploys, 40,000 DynamoDB tables, 80,000-plus RDS tables and three million Lambda functions. A private Docker registry, Artifactory, was deployed using Micros and created a circular dependency that could prevent recovery of either system. A Continuous PaaS Recovery (CPR) project prioritized unpicking dependency tangles that block service recovery because removing all dependencies proved infeasible. A 2023 tabletop disaster recovery exercise simulated 6.5 days of recovery and showed many services remained down due to unresolved dependency tangles.
Read at Theregister
Unable to calculate read time
[
|
]