Meta shifts to AI inference with its future chips
Briefly

Meta shifts to AI inference with its future chips
"Four generations, MTIA 300, 400, 450, and 500, have been produced within less than two years, with several already in production and others scheduled for mass deployment in 2026 and 2027. The quick pace is deliberate. Rather than betting on a single chip generation and waiting years for results, Meta has adopted a roughly six-month cadence per generation, using modular chiplet architecture to enable incremental upgrades without replacing entire rack systems."
"MTIA 300 was built for Meta's ranking and recommendation (R&R) workloads and is currently in production for R&R training. As generative AI grew, MTIA 300 evolved into MTIA 400, featuring a 72-accelerator scale-up domain and 400% higher FP8 FLOPS over its predecessor. Meta says MTIA 400 has finished lab testing and is on the path to data center deployment."
"MTIA 450 targets GenAI inference specifically, doubling HBM bandwidth over MTIA 400, exceeding leading commercial products, Meta claims, and delivering 6x the MX4 FLOPS of FP16/BF16. Mass deployment is scheduled for early 2027. MTIA 500 then adds a further 50% HBM bandwidth increase, up to 80% more HBM capacity, and 43% higher MX4 FLOPS over MTIA 450."
Meta has produced four successive generations of its MTIA AI chip in partnership with Broadcom within less than two years. The MTIA 300, 400, 450, and 500 chips are either in production or scheduled for data center deployment in 2026 and 2027. Meta adopted a six-month cadence per generation using modular chiplet architecture for incremental upgrades without replacing entire systems. MTIA 300 initially targeted ranking and recommendation workloads. MTIA 400 evolved for generative AI with 400% higher FP8 FLOPS. MTIA 450 specifically targets GenAI inference with doubled HBM bandwidth. MTIA 500 adds 50% more HBM bandwidth and 43% higher MX4 FLOPS. From MTIA 300 to 500, HBM bandwidth increases 4.5x and compute FLOPS by 25x.
Read at Techzine Global
Unable to calculate read time
[
|
]