From Confusion to Clarity: Advanced Observability Strategies for Media Workflows at Netflix
Briefly

From Confusion to Clarity: Advanced Observability Strategies for Media Workflows at Netflix
"Imagine binge watching your favorite show, like in Squid Game that most of you might have binge watched, I think, not knowing that there are millions of trace spans and thousands of microservice calls that are orchestrating all of this to happen so that you can watch this title on the product. Did you know it takes about 1 million trace spans to represent the workflow that is used to encode a single episode of Squid Game Season 2,"
"Now let's look at encoding this one episode of Squid Game Season 2 in numbers. When encoding is done for this one episode, it produces 140 video encodes, that is to support different video encode profiles and to support multiple bitrates. Similarly, for audio, there are 552 audio encodes that this one episode has. Then coming to the compute, 122,000 CPU hours were used to encode this title."
"Now switching focus into the observability side of the infrastructure, there are 1 million trace spans that represent the whole orchestration that goes on to produce these encodes, 140 video encodes and 552 audio encodes, and 27,000 unique microservice calls happening to orchestrate this encoding process for one episode. Then 30 microservices that are encoding microservices for audio, video, text, inspection, they were used for processing this title."
Encoding a single hour-long episode requires 140 video encodes to support multiple video profiles and bitrates and 552 audio encodes for varied audio needs. The encoding workflow consumed approximately 122,000 CPU hours for that episode. Observability of the orchestration produced about 1 million trace spans and involved roughly 27,000 unique microservice calls across approximately 30 encoding-related microservices handling audio, video, text, and inspection. Netflix must support playback across a wide range of device types, multiple media processing use cases such as playback, ads, trailer generation, and pre-production studio workflows. The cumulative scale multiplies dramatically across a global content catalog.
Read at InfoQ
Unable to calculate read time
[
|
]