Microsoft launches its second generation AI inference chip, Maia 200
Briefly

Microsoft launches its second generation AI inference chip, Maia 200
"Signaling that the future of AI may not just be how many tokens an AI model generates, but how optimally it does so, Microsoft has announced Maia 200, which it described as a breakthrough inference accelerator and inference powerhouse. The AI silicon is designed for heterogeneous AI infrastructure in multiple environments, and was specifically developed for inferencing on large reasoning models. Microsoft claims it is the most performant first-party silicon from any hyperscaler today, and the most efficient inference system it has ever deployed."
"Maia 200 delivers 3X better 4-bit floating-point (FP4) performance than third-generation Amazon Trainium, Microsoft claims, and 8-bit floating point (FP8) performance above that of Google's seventh generation TPU. By the numbers, this means that Maia features: 10,145 four-bit floating point (FP4) teraflops at peak, versus 2,517 with AWS Trainium3 5,072 eight-bit floating point (FP8) teraflops at peak, versus 2,517 with Trainium3, and 4,614 with Google TPU version 7"
Maia 200 is an inference-focused AI accelerator engineered for heterogeneous environments and large reasoning models. Maia delivers substantially higher FP4 and FP8 teraflops than comparable accelerators, with peak FP4 performance of 10,145 TFLOPS and FP8 performance of 5,072 TFLOPS. Maia offers 216GB of high-bandwidth memory and a 7 terabits-per-second HBM bandwidth figure. Microsoft positions Maia as its most performant first-party silicon among hyperscalers and claims a 30% improvement in performance per dollar versus its current fleet generation. The platform emphasizes inference efficiency and throughput for agentic, multimodel AI workloads.
Read at Computerworld
Unable to calculate read time
[
|
]