INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning

Prime Intellect Team, Sami Jaghouar, Justus Mattern, Jack Min Ong, Jannik Straube, Manveer Basra, Aaron Pazdera, Kushal Thaman, Matthew Di Ferrante, Felix Gabriel, Fares Obeid, Kemal Erdem, Michael Keiblinger, Johannes Hagemann

2025-05-13

INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized
Reinforcement Learning

Summary

This paper talks about INTELLECT-2, a huge language model that gets smarter at reasoning by being trained using a new, worldwide teamwork approach called decentralized reinforcement learning.

What's the problem?

The problem is that training very large language models to be good at reasoning usually takes a ton of computing power and resources, and it's hard to coordinate all the work when many different computers or people are involved, especially if they have different abilities or work at different speeds.

What's the solution?

The researchers used a special system where lots of different contributors from all over the world could help train the model at the same time, even if their computers were different or running at different speeds. This fully asynchronous and decentralized method, along with custom tools and training tweaks, helped INTELLECT-2 become even better than a similar model called QwQ-32B.

Why it matters?

This matters because it shows we can train smarter and bigger AI models by letting people and computers from anywhere in the world work together, making advanced AI more accessible and powerful for everyone.

Abstract

INTELLECT-2, a globally distributed reinforcement learning training of a 32 billion parameter language model, uses fully asynchronous RL across a dynamic, heterogeneous swarm of contributors, improving upon QwQ-32B with custom infrastructure and training modifications.

View Paper