Resilience Best Practices: How Amazon Builds Well-Behaved Clients and Well-Protected Services
Briefly

Michael Haken from AWS shares insights into how cloud services manage demand and capacity, using the analogy of restaurant management during busy lunch hours. He emphasizes operational strategies such as load shedding, and architectural approaches that help prevent service overload. By implementing automated capacity forecasting and over-provisioning, Amazon ensures its services remain resilient and responsive to client demand, well ahead of times when high load occurs. The article outlines friendly client behavior patterns that help enhance the overall performance of cloud systems.
"Resilience lessons from the lunch rush" shares strategies used by the cloud provider for managing queue depth, implementing automated capacity forecasting, and employing load-shedding techniques.
Restaurants need to manage customer demand (load) as well as service time (latency) to maintain the customer experience their patrons expect (...). Other restaurants I worked at were able to use architectural approaches.
True overload scenarios are rare events in the cloud, but when one of these exceptional events does occur, there are different strategies to prevent impact to customers' experience.
Three operational strategies are suggested: load shedding, auto-scaling, and fairness. Load shedding is intentionally discarding work temporarily, protecting a service from becoming overwhelmed.
Read at InfoQ
[
|
]