Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads
Briefly

Nvidia Releases Nemotron 3 Super, a 120B Open AI Model Built for Agentic Workloads
"Nemotron 3 Super utilizes a Mixture-of-Experts architecture, activating only 12.7 billion parameters per forward pass, which helps in reducing the compute costs for AI agents."
"The model delivers up to 7.5 times more throughput than Qwen3.5-122B-A10B, addressing the challenges of extended reasoning chains and high token usage in multi-agent pipelines."
"With a hybrid Mamba-Transformer backbone, Nemotron 3 Super supports context windows of up to one million tokens, avoiding the memory penalties of traditional attention designs."
"The LatentMoE routing system compresses token embeddings into a low-rank space, allowing for the activation of 22 experts at a time, enhancing efficiency without increasing inference costs."
Nvidia has introduced Nemotron 3 Super, a 120 billion parameter open hybrid model that activates only 12.7 billion parameters per forward pass. This model achieves up to 7.5 times more throughput than Qwen3.5-122B-A10B in agent workloads. It employs a Mixture-of-Experts architecture to minimize costs associated with extended reasoning chains and token usage. The model features a hybrid Mamba-Transformer backbone, supporting context windows of up to one million tokens. Additionally, a LatentMoE routing system enhances efficiency by compressing token embeddings and activating multiple experts simultaneously.
Read at news.bitcoin.com
Unable to calculate read time
[
|
]