Stragglers, Not Failures: How Adaptive Hedged Requests Reduce p99 Latency by 74 Percent
Briefly

Stragglers, Not Failures: How Adaptive Hedged Requests Reduce p99 Latency by 74 Percent
Stragglers are slow-completing requests that do not fail, and they dominate p99 latency in fan-out architectures. Retrying slow requests increases load on backends that are already struggling, which can further delay completion times. In a fan-out system with many downstream services, even a small per-service straggler rate can cause most top-level requests to encounter at least one straggler, making per-service health metrics misleading for diagnosing tail latency. Static hedging thresholds can work in benchmarks but fail in production because latency distributions shift with load, deployments, and time of day. DDSketch enables constant-memory, O(1) quantile estimation with relative-error guarantees for real-time per-host tracking. A token-bucket budget that limits hedge rate prevents load-doubling during outages and allows graceful degradation when all requests are slow.
"Stragglers are requests that complete slowly rather than fail. They are the primary driver of p99 latency in fan-out architectures. Retries make them worse by adding load to already-struggling back-ends. The instinct is to reach for retries: If a request is slow, retry it. But this instinct is misleading, because slow requests are not the same as failed requests and conflating the two leads to solutions that make things worse."
"In a fan-out architecture with one hundred downstream services where each has a one percent straggler rate, sixty-three percent of top-level requests will be delayed by at least one straggler. Individual service health metrics can look fine because p50 is fast and p90 is acceptable. Then p99 reveals the problem. This makes diagnosing system-level tail latency difficult when relying on per-service dashboards."
"Static hedge thresholds appear effective in benchmarks but break in production as latency distributions shift with load, deployments, and time of day. Hedge thresholds tuned for one environment stop being correct when the system changes. That requires continuous manual tuning, which rarely happens in practice. As a result, hedging can either under-react or over-react, failing to control tail latency reliably."
"DDSketch provides O(1), constant-memory quantile estimation, with relative-error guarantees (plus or minus one percent). This makes it suitable for real-time per-host latency tracking with approximately thirty-five nanoseconds of overhead per request. A token bucket budget that caps hedge rate at a configurable percentage of total traffic prevents the load-doubling spiral during genuine outages. Hedging automatically stops when every request is slow, allowing the service to degrade gracefully instead of amplifying the problem."
Read at InfoQ
Unable to calculate read time
[
|
]