
"Australian collaborationware company Atlassian has revealed it's spent four years trying to reduce dangerous internal dependencies, and while it has rebuilt its PaaS, it still has issues - but thinks they're now manageable. As explained in a Tuesday post by Senior Engineering Manager Andrew Ross, "Atlassian runs a large service-based platform with thousands of different services, most deployed by our custom orchestration system, 'Micros'.""
"Another piece of Atlassian's infrastructure is a private Docker registry called "Artifactory." In 2021, Atlassian deployed Artifactory using Micros, and the Micros platform depended on Artifactory at deployment and runtime. That circular dependency meant a failure in both of the tools would make it impossible to recover the other. And that's trouble for Atlassian, given it's a SaaS shop and at the time it started to tackle dependencies was about to shift customers from on-prem products to the cloud."
Atlassian runs a massive service-based platform managed by a custom orchestration system called Micros, handling over 2,000 services, 5,000-plus daily deploys, 40,000 DynamoDB tables, 80,000-plus RDS tables and three million Lambda functions. A private Docker registry, Artifactory, was deployed using Micros and created a circular dependency that could prevent recovery of either system. A Continuous PaaS Recovery (CPR) project prioritized unpicking dependency tangles that block service recovery because removing all dependencies proved infeasible. A 2023 tabletop disaster recovery exercise simulated 6.5 days of recovery and showed many services remained down due to unresolved dependency tangles.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]