
"Engineers at Netflix have uncovered deep performance bottlenecks in container scaling that trace not to Kubernetes or containerd alone, but into the CPU architecture and Linux kernel itself. In a detailed blog post, Netflix technologists explain how their move to a modern container runtime exposed surprising contention on global mount locks in the kernel's virtual filesystem (VFS), revealing that underlying hardware topology and lock contention can limit the scaling of hundreds of containers concurrently, even on powerful cloud servers."
"Investigations showed the mount table ballooning dramatically during the startup of many-layer container images, straining the kernel's global mount lock as containerd executed thousands of bind mount operations to map user namespaces for each image layer. With every container requiring dozens of mounts and unmounts, the cumulative workload easily exceeded 20,000 mount syscalls during large bursts, all needing access to the same kernel lock, a classic concurrency bottleneck deep in the operating system."
"Netflix's performance team found that not all CPU architectures behave the same under this load. On older dual-socket AWS r5.metal instances (with multiple NUMA domains and mesh-based cache coherence), high concurrency accelerated contention on shared caches and global locks, severely degrading performance. By contrast, newer single-socket instances such as AWS m7i.metal (Intel) and m7a.24xlarge (AMD) with distributed cache architectures scaled much more smoothly, with fewer stalls even as container counts climbed."
Netflix engineers identified critical performance bottlenecks in container scaling that originate from global mount locks in the Linux kernel's virtual filesystem rather than Kubernetes or containerd. During high-concurrency container startup, the mount table expands dramatically as containerd executes thousands of bind mount operations for user namespaces across image layers. This creates a classic concurrency bottleneck where tens of thousands of mount syscalls compete for the same kernel lock, causing nodes to stall for extended periods. Performance analysis revealed that CPU architecture significantly impacts lock contention behavior. Older dual-socket instances with multiple NUMA domains and mesh-based cache coherence experience severe performance degradation under high concurrency. Newer single-socket instances with distributed cache architectures scale more smoothly with fewer stalls. Factors including NUMA effects, hyperthreading, and cache microarchitecture substantially influence how global lock contention propagates through the system.
#container-scaling-performance #linux-kernel-mount-locks #cpu-architecture-impact #numa-and-cache-contention #kubernetes-infrastructure
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]