Uber Shares Strategy for Controlling Risk in Monorepo Changes That Affect 3,000+ Microservices
Briefly

Uber Shares Strategy for Controlling Risk in Monorepo Changes That Affect 3,000+ Microservices
"Uber has published details on their approach to controlling rollouts of large-scale changes across monorepos that serve thousands of microservices, addressing one of the key challenges in continuous deployment at massive scale. The ride-sharing giant's engineering team faced a critical problem: when a single commit to their monorepos can affect thousands of services simultaneously-such as upgrading an RPC library used across virtually every Go service at Uber-how do you minimize the potential damage from a problematic change?"
"Uber's engineering stack relies on a few monorepositories, one per main programming language, that collectively host hundreds or thousands of services, all developed trunk-based and released from the main branch. This structure supports a high degree of code reuse and streamlined workflows, but it brings a significant risk: a single commit, say updating a core RPC library, can ripple through and impact vastly more services than anticipated."
"By analyzing 500,000 commits in their Go monorepo, the team discovered that 1.4 percent of commits impacted more than 100 services, and 0.3 percent impacted over 1,000 services at Uber. While not inherently more dangerous in content, these large-scale changes carry an exponentially greater potential for disruption, especially when automated CD pipelines immediately push changes to production. Uber's earlier safety architecture focused on pre-land testing and service-level health monitoring during deployment."
Uber operates a few language-specific monorepositories that host hundreds to thousands of services developed trunk-based and released from main. Single commits can affect vast numbers of services, for example upgrading a core RPC library, creating high-risk, large-scale changes. Analysis of 500,000 Go commits showed 1.4% impacted over 100 services and 0.3% impacted over 1,000 services, increasing potential disruption when automated CD pipelines push immediately to production. Earlier safety relied on pre-land testing and service-level health monitoring, which proved insufficient as deployment automation expanded. A cross-cutting deployment orchestration layer adds a global gate using aggregated signals across impacted services.
Read at InfoQ
Unable to calculate read time
[
|
]