
"In recent blog posts, both Uber ( Uber's Rate Limiting System), and OpenAI ( Beyond rate limits: scaling access to Codex and Sora) discuss shifts in their approach to rate limiting: moving from counter-based, per-service limits to adaptive, policy-based systems. Both companies developed proprietary rate-limiting platforms implemented at the infrastructure layer. These systems feature soft controls that manage traffic by asserting pressure on clients rather than utilizing hard stops - either through probabilistic shedding or credit-based waterfalls - ensuring system resilience without sacrificing user momentum."
"Previously, Uber engineers implemented rate limits per service, commonly using token buckets backed by Redis. This caused operational inefficiencies, such as additional latency and the need for deployments just to adjust thresholds. Inconsistent configurations increased maintenance risk and resulted in uneven protection, leaving some smaller services without any limits. Additionally, observability was fragmented, making it difficult to pinpoint problems caused specifically by rate limiting."
Both Uber and OpenAI moved from per-service, counter-based rate limiting toward infrastructure-layer, policy-driven rate limiting that uses soft controls to shape traffic. Uber replaced token-bucket limiters backed by Redis with a Global Rate Limiter (GRL) that uses a three-tier feedback loop: local clients in the service mesh enforce decisions, zone aggregators collect metrics, and regional controllers compute global limits. GRL applies configurable traffic drops (for example, 10%) as soft pressure rather than hard stops. OpenAI adopted a similar architecture focused on improving user experience for Codex and Sora. The new approach reduces operational complexity and improves observability and resilience.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]