Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
Ling-Team, Ang Li, Ben Liu, Binbin Hu, Bing Li, Bingwei Zeng, Borui Ye, Caizhi Tang, Changxin Tian, Chao Huang, Chao Zhang, Chen Qian, Chenchen Ju, Chenchen Li, Chengfu Tang, Chili Fu, Chunshao Ren, Chunwei Wu, Cong Zhang, Cunyin Peng, Dafeng Xu, Daixin Wang
2025-11-04
Summary
This paper introduces Ling 2.0, a new set of powerful language models designed to be really good at reasoning. These models range in size, with the largest being a massive one trillion parameters, and they're built using a technique that makes them efficient even at that huge scale.
What's the problem?
Existing large language models, while impressive, often struggle with complex reasoning tasks and can be incredibly expensive to run because they require a lot of computing power. The challenge is to create models that are both highly capable in reasoning *and* practical to use, meaning they don't require enormous resources.
What's the solution?
The researchers tackled this by building Ling 2.0 using a 'Mixture-of-Experts' approach, which means only parts of the model are active for any given task, making it much more efficient. They also focused on carefully selecting the data the model learns from, incorporating techniques to encourage reasoning during training, and optimizing the hardware and software to handle the massive scale of the trillion-parameter model. They used a special type of math (FP8 training) to make calculations faster and more efficient.
Why it matters?
Ling 2.0 represents a significant step forward in building AI that can truly reason and solve complex problems. By demonstrating that it's possible to create a highly accurate and efficient trillion-parameter model, this work paves the way for even more advanced AI systems in the future, and provides a foundation for other models like the 'Ring' series.
Abstract
We introduce Ling 2.0, a series reasoning-oriented language foundation built upon the principle that every activation boosts reasoning capability. Designed to scale from tens of billions to one trillion parameters under a unified Mixture-of-Experts (MoE) paradigm, Ling 2.0 emphasizes high sparsity, cross-scale consistency, and efficiency guided by empirical scaling laws. The series includes three non-thinking (instruct) models - Ling-mini-2.0, Ling-flash-2.0, and Ling-1T - ranging from 16B to 1T total parameters and achieving up to 7-fold active-compute efficiency compared with dense counterparts. Ling 2.0 integrates coordinated innovations across model architecture, pre-training, post-training, and infrastructure: a high-sparsity MoE with MTP for efficient reasoning, reasoning-oriented data and mid-training CoT activation, reinforcement-based fine-tuning (DFT, Evo-CoT), and full-scale FP8 training with fine-grained heterogeneous pipelines. At the trillion scale, Ling-1T establishes a new Pareto frontier of reasoning accuracy versus computational efficiency, demonstrating that sparse activation, when properly aligned with reasoning objectives, enables scalable and efficient intelligence. Collectively, Ling 2.0 provides a coherent, open, and efficient foundation for advancing future reasoning and thinking models, including the Ring series built upon the same base.