Microsoft introduces AI accelerator for US Azure customers | Computer Weekly
Briefly

"Microsoft describes Maia 200 as an inference powerhouse, built on TSMC 3nm process with native FP8/FP4 (floating point) tensor cores, a redesigned memory system that uses 21 6GB of the latest high-speed memory architecture (HBM3e). This is capable of transferring data at 7TB per second. Maia also provides 272MB of on-chip memory plus data movement engines, which Microsoft said is used to keep massive models fed, fast and highly utilised."
"According to the company, these hardware features mean Maia 200 is capable of delivering three times the FP4 performance of the third generation Amazon Trainium, and FP8 performance above Google's seventh-generation tensor processing unit. Microsoft said Maia 200 represents its most efficient inference system yet, offering 30% better cost performance over existing systems, but at the time of writing, it was unable to give a date as to when the product would be available outside of the US."
"In a blog post describing how Maia 200 is being deployed, Scott Guthrie, Microsoft executive vice-president for cloud and AI, said the setup comprises racks of trays configured with four Maia accelerators. Each tray is fully connected with direct, non‑switched links, to keep high‑bandwidth communication local for optimal inference efficiency. He said the same communication protocol is used for intra-rack and inter-rack networking using the Maia AI transport protocol to provide a way to scale clusters of Maia 200 accelerators with minimal network hops."
Microsoft is deploying Maia 200, an AI inference accelerator built on TSMC 3nm process with native FP8 and FP4 tensor cores and a redesigned memory system using 21 6GB HBM3e stacks capable of 7TB/s transfer. Maia 200 includes 272MB on-chip memory and data movement engines to keep large models fed and highly utilized. Microsoft claims Maia 200 delivers three times the FP4 performance of third-generation Amazon Trainium and FP8 performance above Google's seventh-generation TPU, with about 30% better cost performance versus existing systems. Initial deployment targets Azure US Central and US West 3 regions, with wider availability not yet scheduled. Racks use trays of four Maia accelerators connected by direct non-switched links and the Maia AI transport protocol to minimize hops and simplify scaling.
Read at ComputerWeekly.com
Unable to calculate read time
[
|
]