How Nvidia is using emulation to turn AI FLOPS into FP64
Briefly

How Nvidia is using emulation to turn AI FLOPS into FP64
"Double precision floating point computation (aka FP64) is what keeps modern aircraft in the sky, rockets going up, vaccines effective, and, yes, nuclear weapons operational. But rather than building dedicated chips that process this essential data type in hardware, Nvidia is leaning on emulation to increase performance for HPC and scientific computing applications, an area where AMD has had the lead in recent generations."
"This emulation, we should note, hasn't replaced hardware FP64 in Nvidia's GPUs. Nvidia's newly unveiled Rubin GPUs still deliver about 33 teraFLOPS of peak FP64 performance, but that's actually one teraFLOP less than the now four-year-old H100. If you switch on software emulation in Nvidia's CUDA libraries, the chip can purportedly achieve up to 200 teraFLOPS of FP64 matrix performance. That's 4.4x of what its outgoing Blackwell accelerators could muster in hardware."
FP64 double-precision computation remains essential for high-accuracy scientific and engineering workloads. Nvidia is relying on software emulation rather than additional dedicated FP64 hardware to boost throughput for HPC and scientific computing. Rubin GPUs retain about 33 teraFLOPS of native FP64 performance, slightly below the H100, but enable up to approximately 200 teraFLOPS of FP64 matrix performance via CUDA emulation. Nvidia reports emulation accuracy comparable to tensor-core hardware implementations. Some competitors and researchers caution that strong benchmark results do not yet prove equivalence in full, real-world scientific simulations, so further validation is needed.
Read at Theregister
Unable to calculate read time
[
|
]