NVIDIA Nemotron 3: Efficient and Open Intelligence
NVIDIA, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman
2025-12-25
Summary
This paper introduces the Nemotron 3 family of AI models – Nano, Super, and Ultra – which are designed to be really good at acting as helpful assistants, thinking through problems, and holding conversations.
What's the problem?
Existing AI models often struggle with handling long conversations or complex tasks efficiently, and sometimes don't perform as well as they could in reasoning or using tools to solve problems. Also, many powerful AI models aren't readily available for others to use and build upon.
What's the solution?
The researchers created Nemotron 3 using a new architecture that combines the strengths of two different approaches, Mamba and Transformer, to process information quickly and handle very long inputs – up to a million pieces of text! The larger models, Super and Ultra, use special techniques to improve quality and speed up text generation. All the models were then further trained using reinforcement learning, which is like giving the AI rewards for good reasoning and problem-solving. Finally, they plan to release all the model details and code publicly.
Why it matters?
These new models represent a step forward in AI capabilities, offering better performance, efficiency, and the ability to handle more complex tasks. The open release of the models and related resources will allow other researchers and developers to build upon this work, potentially leading to even more advanced AI applications in areas like customer service automation and general problem-solving.
Abstract
We introduce the Nemotron 3 family of models - Nano, Super, and Ultra. These models deliver strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens. Super and Ultra models are trained with NVFP4 and incorporate LatentMoE, a novel approach that improves model quality. The two larger models also include MTP layers for faster text generation. All Nemotron 3 models are post-trained using multi-environment reinforcement learning enabling reasoning, multi-step tool use, and support granular reasoning budget control. Nano, the smallest model, outperforms comparable models in accuracy while remaining extremely cost-efficient for inference. Super is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Ultra, the largest model, provides state-of-the-art accuracy and reasoning performance. Nano is released together with its technical report and this white paper, while Super and Ultra will follow in the coming months. We will openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights.