Kubernetes Horizontal Pod Autoscaler (HPA)'s delayed reactions might impact edge performance, while creating a custom autoscaler could achieve more stable scale-up and scale-down behavior based on domain-specific metrics and multiple signal evaluations. Startup time of pods should be included in the autoscaling logic because reacting only when CPU spiking occurs delays the increase in scale and reduces performance. Safe scale-down policies and a cooldown window are necessary to prevent replica oscillations, especially when high-frequency metric signals are being used.
Steve Yegge thinks he has the answer. The veteran engineer - 40+ years at Amazon, Google and Sourcegraph - spent the second half of 2025 building Gas Town, an open-source orchestration system that coordinates 20 to 30 Claude Code instances working in parallel on the same codebase. He describes it as "Kubernetes for AI coding agents." The comparison isn't just marketing. It's architecturally accurate.
Kubernetes has transitioned from a versatile framework for container orchestration to the primary engine powering the global surge in artificial intelligence development. The Cloud Native Computing Foundation (CNCF) highlighted this evolution in a recent report, which examines the intersection of cloud-native infrastructure and machine learning. While the technical capabilities of the ecosystem have reached a point of high maturity, the research suggests that human and organisational factors now serve as the most significant barriers to successful deployment.
In a two-part blog series, Soam Acharya, Rainie Li, William Tom and Ang Zhang describe how the Pinterest Big Data Platform team considered alternatives for their next-generation massive-scale data processing platform as the limits of the existing Hadoop-based system, known internally as Monarch, became clear. They present Moka as the outcome of that search, and as their EKS based cloud native data processing platform, which now runs production workloads at Pinterest scale.
Over the past decade, software development has been shaped by two closely related transformations. One is the rise of devops and continuous integration and continuous delivery (CI/CD), which brought development and operations teams together around automated, incremental software delivery. The other is the shift from monolithic applications to distributed, cloud-native systems built from microservices and containers, typically managed by orchestration platforms such as Kubernetes.
In today's episode, I will be speaking with Somtochi Onyekwere, software engineer at Fly.io organization. We will discuss the recent developments in distributed data systems, especially topics like eventual consistency and how to achieve fast, eventually consistent replication across distributed nodes. We'll also talk about the conflict-free replicated data type data structures, also known as CRDTs and how they can help with conflict resolution when managing data in distributed data storage systems.
Software Architecture and Design Trends Report 2025 This report explores how architects are adapting to a world shaped by AI. As large language models (LLMs) become commonplace, attention is turning toward small, specialized models, agentic systems, and retrieval-augmented generation (RAG) as practical design patterns. Architects are now being asked to balance efficiency, quality, sustainability, and decentralized decision-making. Culture and Methods Trends Report 2025 This report highlights a parallel tension.
Amazon Web Services has launched Amazon EKS Capabilities, a set of fully managed, Kubernetes-native features designed to streamline workload orchestration, AWS cloud resource management, and Kubernetes resource composition and automation. The capabilities, now generally available across most AWS commercial regions, bundle popular open-source tools into a managed platform layer, reducing the operational burden on engineering teams and enabling faster application deployment and scaling on Amazon Elastic Kubernetes Service (EKS).
AWS Identity Misconfigurations: We will show how attackers abuse simple setup errors in AWS identities to gain initial access without stealing a single password. Hiding in AI Models: You will see how adversaries mask malicious files in production by mimicking the naming structures of your legitimate AI models. Risky Kubernetes Permissions: We will examine "overprivileged entities"-containers that have too much power-and how attackers exploit them to take over infrastructure.
The feat was achieved by re-architecting key components of Kubernetes' control plane and storage backend, replacing the traditional etcd data store with a custom Spanner-based system that can support massive scale, and optimizing cluster APIs and scheduling logic to reduce load from constant node and pod updates. The engineering team also introduced new tooling for automated, parallelized node pool provisioning and faster resizing, helping overcome typical bottlenecks that would hinder responsiveness at such a scale.
This challenge is sparking innovations in the inference stack. That's where Dynamo comes in. Dynamo is an open-source framework for distributed inference. It manages execution across GPUs and nodes. It breaks inference into phases, like prefill and decode. It also separates memory-bound and compute-bound tasks. Plus, it dynamically manages GPU resources to boost usage and keep latency low. Dynamo allows infrastructure teams to scale inference capacity responsively, handling demand spikes without permanently overprovisioning expensive GPU resources.
Discord has detailed how it rebuilt its machine learning platform after hitting the limits of single-GPU training. By standardising on Ray and Kubernetes, introducing a one-command cluster CLI, and automating workflows through Dagster and KubeRay, the company turned distributed training into a routine operation. The changes enabled daily retrains for large models and contributed to a 200% uplift in a key ads ranking metric. Similar engineering reports are emerging from companies such as Uber, Pinterest, and Spotify as bespoke models grow in size and frequency.
Docker recently announced the release of Docker Desktop 4.50, marking another update for developers seeking faster, more secure workflows and expanded AI-integration capabilities. The release introduces a free version of Docker Debug for all users, deeper IDE integration (including VSCode and Cursor), improved multi-service to Kubernetes conversion support, new enterprise-grade governance controls, and early support for Model Context Protocol (MCP) tooling.
AI inference is the process by which a trained large language model (LLM) applies what it has learned to new data to make predictions, decisions, or classifications. In practical terms, the process goes like this. After a model is trained, say the new GPT 5.1, we use it during the inference phase, where it analyzes data (like a new image) and produces an output (identifying what's in the image) without being explicitly programmed for each fresh image. These inference workloads bridge the gap between LLMs and AI chatbots and agents.
A safe, universal platform for AI workloads CKACP's goal is to create community-defined, open standards for consistently and reliably running AI workloads across different Kubernetes environments. Also: Why even a US tech giant is launching 'sovereign support' for Europe now CNCF CTO Chris Aniszczyk said, "This conformance program will create shared criteria to ensure AI workloads behave predictably across environments. It builds on the same successful community-driven process we've used with Kubernetes to help bring consistency across over 100-plus Kubernetes systems as AI adoption scales."
The operational scale of their K8s platform includes 1400 K8s clusters, millions of pods, thousands of compute nodes, 40+ operators and integrations, and 200+ monitoring plugins. The speakers highlighted that they estimate the capacity to increase five times in the next couple of years. The overall goal of the solution is to let application teams focus on business requirements, not get bogged down with infrastructure overhead.
Google is launching Agent Sandbox, a new Kubernetes primitive built for AI agents. The technology provides kernel-level isolation and can run thousands of sandboxes in parallel. Google built Agent Sandbox as an open-source project within the Cloud Native Computing Foundation. The technology is based on gVisor, with additional support for Kata Containers. This provides kernel-level isolation that counteracts vulnerabilities. Each agent task is assigned its own isolated sandbox.
The Cloud Native Computing Foundation (CNCF) published a blog post discussing how vCluster, an open-source project by Loft Labs, addresses key multi-tenancy obstacles in Kubernetes clusters by enabling "virtual clusters" within a single host cluster. This approach enables multiple tenants to have isolated control planes while sharing underlying compute resources, thereby reducing overhead without compromising isolation. Traditional namespace-based isolation in Kubernetes often falls short when tenants need to deploy cluster-scoped resources like custom resource definitions (CRDs)
Airbnb's engineering team has rolled out Mussel v2, a complete rearchitecture of its internal key value engine designed to unify streaming and bulk ingestion while simplifying operations and scaling to larger workloads. The new system reportedly sustains over 100,000 streaming writes per second, supports tables exceeding 100 terabytes with p99 read latencies under 25 milliseconds, and ingests tens of terabytes in bulk workloads, allowing caller teams to focus on product innovation rather than managing data pipelines.
Sidero Labs has been developing Talos Linux, an immutable operating system purpose-built exclusively for running Kubernetes, alongside Omni, a cluster lifecycle management platform. InfoQ met the Sidero team in Amsterdam during the TalosCon 2025 and had conversations about their approach to simplifying Kubernetes operations through minimalism and security-first design. The concept for Talos emerged from practical frustrations with traditional operating systems in enterprise environments.
But Leo's expertise doesn't stop at tech. He also founded Homeland Shrimp, an indoor aquaculture business he engineered himself. His self-heating, closed-loop system is a blend of thermodynamics, automation, and sustainable thinking-designed to raise Pacific white shrimp efficiently and responsibly. Leo volunteers locally, helping seniors with yard care through a Sherburne County initiative. He also supports causes like Imagine Farm, which promote sustainable agriculture.