Rack scale is on the rise, but it's not for everyone... yet

"The shift toward rack-scale architecture underscores a change in appetites among model devs. Most foundation models have been trained on eight-way GPU systems like Nvidia's DGX H100."

"These larger compute domains offer a number of advantages for compute and memory hungry training workloads. The network is one of the biggest bottlenecks when training."

Despite the introduction of new systems like Nvidia's NVL72, AMD's Helios, and Intel's Jaguar Shores, eight-way HGX servers appear to have a lasting role in data centers. High costs and energy demands present hurdles for enterprises as they explore AI applications. AMD's Helios design caters to specific hyperscaler needs and is part of a broader strategy. The technological landscape is shifting, with the demand for larger compute systems anticipated to grow, suggesting a future where models trained on even larger GPU setups are common.

#nvidia #amd #ai-technology #data-center #rack-scale-architecture

Read at Theregister

Unable to calculate read time

Collection

[

...

]

Rack scale is on the rise, but it's not for everyone... yetRack scale is on the rise, but it's not for everyone... yet Briefly

Rack scale is on the rise, but it's not for everyone... yet
Rack scale is on the rise, but it's not for everyone... yet
Briefly