Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

NVIDIA, Aaron Blakeman, Aaron Grattafiori, Aarti Basant, Abhibha Gupta, Abhinav Khattar, Adi Renduchintala, Aditya Vavre, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Kondratenko, Alexander Bukharin, Alexandre Milesi, Ali Taghibakhshi, Alisa Liu, Amelia Barton, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman

2025-12-25

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Summary

This paper introduces Nemotron 3 Nano 30B-A3B, a new language model that combines the strengths of two different types of neural network architectures, Mamba and Transformer, in a way called a Mixture-of-Experts. It's designed to be a powerful and efficient tool for understanding and generating text.

What's the problem?

Existing large language models are often very large and require a lot of computing power to run, making them slow and expensive. Also, while they can generate text, they sometimes struggle with complex reasoning, acting as helpful assistants, or handling very long pieces of text. The goal was to create a model that is both accurate *and* efficient, improving on previous models like Nemotron 2.

What's the solution?

The researchers built Nemotron 3 Nano by first training it on a massive amount of text data – 25 trillion words, including a lot of new information. Then, they refined its abilities through supervised learning and reinforcement learning, essentially teaching it to perform specific tasks and rewarding it for good responses. A key innovation is the hybrid Mamba-Transformer architecture and the 'Mixture-of-Experts' approach, which allows the model to activate only the necessary parts of the network for each task, making it faster. They also increased the amount of text the model can process at once to a million tokens.

Why it matters?

Nemotron 3 Nano represents a significant step forward in language model technology. It's faster and more accurate than similar models, meaning it can process information and generate text more efficiently and effectively. Its improved reasoning and chat abilities make it better suited for building helpful AI assistants, and its ability to handle long texts opens up possibilities for working with complex documents and conversations. Finally, making the model publicly available allows other researchers and developers to build upon this work.

Abstract

We present Nemotron 3 Nano 30B-A3B, a Mixture-of-Experts hybrid Mamba-Transformer language model. Nemotron 3 Nano was pretrained on 25 trillion text tokens, including more than 3 trillion new unique tokens over Nemotron 2, followed by supervised fine tuning and large-scale RL on diverse environments. Nemotron 3 Nano achieves better accuracy than our previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass. It achieves up to 3.3x higher inference throughput than similarly-sized open models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507, while also being more accurate on popular benchmarks. Nemotron 3 Nano demonstrates enhanced agentic, reasoning, and chat abilities and supports context lengths up to 1M tokens. We release both our pretrained Nemotron 3 Nano 30B-A3B Base and post-trained Nemotron 3 Nano 30B-A3B checkpoints on Hugging Face.

View Paper