How Netflix Ensures Highly-Reliable Online Stateful Systems
Briefly

Reliable servers are redundant, workload-optimized, and heavily cached, offering quick data recovery and leverage multiple replicated copies across cloud availability zones.
Reliable clients make constant incremental progress and learn how to retry or hedge requests to meet service level objectives (SLOs).
Reliable APIs depend on idempotency and fixed-size units of work.
Instead of focusing on the number of nines, consider how often systems fail, the blast radius, and recovery time to effectively address failure modes.
Read at InfoQ
[
add
]
[
|
|
]