Data center growth faces hard limits from space, power and cooling, constraining both scale-up and scale-out approaches. Nvidia proposes a third axis, 'scale-across,' that links AI resources across separate physical locations to increase aggregate capacity. Spectrum-XGS Ethernet extends Spectrum-X to allow remote AI chips to operate as a unified system through automatic congestion control, latency management, and end-to-end telemetry. Benchmarks using Nvidia's NCCL show a 1.9x performance improvement over typical inter-data-center network speeds. The approach targets synchronized, cross-site AI workloads and reframes capacity expansion by connecting multiple facilities into AI "superfactories."
Nvidia believes that the AI-driven data center growth needs another avenue of exploration. Only the connecting of different physical locations can allow for the desired AI capacity to be achieved. In addition to scale-up and scale-out, this also requires 'scale-across', as the chipmaker coins it. The means to achieve this feat is Nvidia Spectrum-XGS Ethernet. This technology, a continuation and expansion of the existing Spectrum-X Ethernet platform, allows AI chips from different locations to behave as one giant 'superchip'.
Scale-up, or expanding a single system or rack, is further limited by the capabilities of infrastructure such as water cooling. There is simply a maximum wattage that can be covered by the existing installation, however large. At the same time, the number of locations inside a data center is limited, so scale-out, or adding more racks, more servers, etc., also has a hard upper boundary.
Although data centers communicate with each other at lightning speed, the requirements for an AI workload are very high. Synchronization across all processors takes an eternity in hardware terms. Spectrum-XGS Ethernet aims to change this. Nvidia refers to this as AI "superfactories." Automatic congestion control of network traffic, latency management, and end-to-end telemetry work together to deliver a 1.9x performance gain over typical network speeds between data centers. This is based on a benchmark using Nvidia's Collective Communications Library (NCCL, pronounced "Nickel").
Collection
[
|
...
]