Agentic AI shifts the system bottleneck from raw compute to memory: prolonged KV cache residency demands greater capacity, bandwidth, and fast hierarchical memory switching.
APIs for large language models are an inadequate abstraction; the real problem is distributed state synchronization involving token histories and GPU KV caches.