
"Microsoft believes the next generation of AI models will use hundreds of trillions of parameters. To train them, it's not just building bigger, more efficient datacenters - it's started connecting distant facilities using high-speed networks spanning hundreds or thousands of miles. The first node of this multi-datacenter cluster came online in October, connecting Microsoft's datacenter campus in Mount Pleasant, Wisconsin, to a facility in Atlanta, Georgia. The software giant's goal is to eventually scale AI workloads across datacenters using similar methods as employed to distribute high-performance computing and AI workloads across multiple servers today."
""To make improvements in the capabilities of the AI, you need to have larger and larger infrastructure to train it," said Microsoft Azure CTO Mark Russinovich in a canned statement. "The amount of infrastructure required now to train these models is not just one datacenter, not two, but multiples of that." These aren't any ordinary datacenters, either. The facilities are the first in a family of bit barns Microsoft is calling its "Fairwater" clusters."
Microsoft is linking geographically separated datacenters with high-speed networks to train vastly larger AI models requiring hundreds of trillions of parameters. The initial link connects Mount Pleasant, Wisconsin, to Atlanta, Georgia, as the first node of a multi-datacenter cluster. Microsoft plans to scale AI workloads across multiple datacenters using methods similar to distributed high-performance computing. The Fairwater facilities are two-story "bit barns" using direct-to-chip liquid cooling and minimal water. Plans call for hundreds of thousands of diverse GPUs, including Nvidia GB200 NVL72 racks in Atlanta offering high kilowatt capacity, massive sparse FP8 compute, and large HBM3e memory.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]