Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen
2026-02-17
Summary
This paper introduces Nanbeige4.1-3B, a new language model that's surprisingly capable despite its small size. It can handle a variety of tasks – acting like an intelligent agent, writing code, and general problem-solving – all within a single model.
What's the problem?
Existing small language models typically excel at only one or two tasks. To get a model that can do many things well, you usually need a very large model with billions of parameters, which requires a lot of computing power and resources. The challenge was to create a small, open-source model that could perform well across a broad range of abilities without needing massive resources.
What's the solution?
The researchers tackled this by carefully designing how the model learns. They used a combination of reward systems – giving the model feedback on both individual responses and comparing different responses to see which is better – to improve its reasoning and make sure its answers align with human preferences. For coding, they focused on rewarding both correct code and efficient code. They also developed a training method that allows the model to reliably use tools over a long series of steps, enabling it to solve complex problems that require many actions.
Why it matters?
This work shows that you don't necessarily need a huge model to achieve strong performance. Nanbeige4.1-3B, with only 3 billion parameters, performs as well as or even better than much larger models on several tasks. This is a big deal because it means more researchers and developers can work with powerful language models without needing access to enormous computing resources, potentially accelerating progress in the field.
Abstract
We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nanbeige4-3B-2511 and Qwen3-4B, even achieving superior performance compared to much larger models, such as Qwen3-30B-A3B. Our results demonstrate that small models can achieve both broad competence and strong specialization simultaneously, redefining the potential of 3B parameter models.