The new capabilities center on two integrated components: the Dynamo Planner Profiler and the SLO-based Dynamo Planner. These tools work together to solve the "rate matching" challenge in disaggregated serving. The teams use this term when they split inference workloads. They separate prefill operations, which process the input context, from decode operations that generate output tokens. These tasks run on different GPU pools. Without the right tools, teams spend a lot of time determining the optimal GPU allocation for these phases.
But the machine is far from the fastest GPU in Nvidia's lineup. It's not going to beat out an RTX 5090 in large language model (LLM) inference, fine tuning, or even image generation - never mind gaming. What the DGX Spark, and the slew of GB10-based systems hitting the market tomorrow, can do is run models the 5090 or any other consumer graphics card on the market today simply can't.