Running AI models is turning into a memory game

"It started off as a very simple page six or seven months ago, especially as Claude Code was launching - just "use caching, it's cheaper." Now it's an encyclopedia of advice on exactly how many cache writes to pre-buy. You've got 5-minute tiers, which are very common across the industry, or 1-hour tiers - and nothing above. That's a really important tell."

"When we talk about the cost of AI infrastructure, the focus is usually on Nvidia and GPUs - but memory is an increasingly important part of the picture. As hyperscalers prepare to build out billions of dollars worth of new data centers, the price for DRAM chips has jumped roughly 7x in the last year. At the same time, there's a growing discipline in orchestrating all that memory to make sure the right data gets to the right agent at the right time."

DRAM chip prices have surged roughly sevenfold over the past year as hyperscalers plan multibillion-dollar data center expansions. Memory costs are becoming a major component of AI infrastructure economics alongside GPUs. Orchestrating memory to route the right data to the right agent at the right time reduces token usage and lowers inference costs. Prompt-caching tiers, such as five-minute and one-hour windows, create pricing tradeoffs and arbitrage opportunities tied to pre-purchased cache writes and reads. Effective cache management can dramatically lower operational spending, but adding new data to queries increases cache writes and can raise costs if not carefully controlled.

#dram-pricing #memory-orchestration #prompt-caching #ai-infrastructure

Read at TechCrunch

Unable to calculate read time

Collection

[

...

]

Running AI models is turning into a memory game | TechCrunchRunning AI models is turning into a memory game | TechCrunch Briefly

Running AI models is turning into a memory game | TechCrunch
Running AI models is turning into a memory game | TechCrunch
Briefly