AI giants call for energy grid kumbaya
Briefly

AI training workloads produce large, synchronous oscillations in power demand between GPU compute phases and communication phases. Compute phases push GPUs toward thermal limits and consume high power while communication phases approach idle energy usage, creating extreme demand differences. These oscillations occur at server, rack, datacenter, and grid levels because parallel training synchronizes workloads, amplifying aggregate variability. At scale, swings can reach tens or hundreds of megawatts and can align with resonant frequencies of grid components, risking instability and mechanical failure. Utility measurements and projections indicate growing grid stress from data center energy demand, necessitating cross-discipline mitigation strategies.
The paper, " Power Stabilization for AI Training Datacenters," argues that oscillating energy demand between the power-intensive GPU compute phase and the less-taxing communication phase, where parallelized GPU calculations get synchronized, represents a barrier to the development of AI models. The authors note that the difference in power consumption between the compute and communication phases is extreme, the former approaching the thermal limits of the GPU and the latter being close to idle time energy usage.
This variation in power demand occurs at the node (server) level and across other nodes at the data center, due to the synchronous nature of AI training. So these oscillations become visible at the rack, datacenter, and power grid levels - imagine 50,000 hairdryers (~2000 watts) being turned on at once.
Read at Theregister
[
|
]