PretrainZero: Reinforcement Active Pretraining
Xingrun Xing, Zhiyuan Fan, Jie Lou, Guoqi Li, Jiajun Zhang, Debing Zhang
2025-12-04
Summary
This paper introduces a new way to train AI models, called PretrainZero, to make them better at general reasoning – thinking and problem-solving like humans. It aims to move beyond AI that only excels at specific tasks by letting the AI learn from a huge amount of general knowledge, like the entire Wikipedia.
What's the problem?
Current AI models, especially those using reinforcement learning, are really good at things they’re specifically trained for, like coding or solving math problems. However, they need clear rewards to learn, and this limits their ability to handle more open-ended, general reasoning tasks. Essentially, they get stuck needing constant feedback and struggle to learn from just experience like humans do. They rely too much on having someone *tell* them if they're right or wrong.
What's the solution?
PretrainZero tackles this by using a method called 'active pretraining'. Imagine a student who actively chooses what to study based on what they find interesting and helpful. PretrainZero does something similar: it learns to pick out useful information from a large text source (Wikipedia) and then tries to predict that information using reinforcement learning. Crucially, it does this *without* needing any pre-labeled data or external rewards – it learns from the text itself. It also gradually increases the difficulty of what it tries to predict, making it constantly challenge itself and improve its reasoning skills.
Why it matters?
This research is important because it’s a step towards creating AI that can truly think and learn like humans, known as artificial general intelligence. By breaking the reliance on specific rewards and labeled data, PretrainZero opens the door to AI that can reason more broadly and adapt to new situations without needing constant supervision. This could lead to AI systems that are much more versatile and capable of solving complex, real-world problems.
Abstract
Mimicking human behavior to actively learning from general experience and achieve artificial general intelligence has always been a human dream. Recent reinforcement learning (RL) based large-thinking models demonstrate impressive expert-level abilities, i.e., software and math, but still rely heavily on verifiable rewards in specific domains, placing a significant bottleneck to extend the performance boundary of general reasoning capabilities. In this work, we propose PretrainZero, a reinforcement active learning framework built on the pretraining corpus to extend RL from domain-specific post-training to general pretraining. PretrainZero features the following characteristics: 1) Active pretraining: inspired by the active learning ability of humans, PretrainZero learns a unified reasoning policy to actively identify reasonable and informative contents from pretraining corpus, and reason to predict these contents by RL. 2) Self-supervised learning: without any verifiable labels, pretrained reward models, or supervised fine-tuning, we directly pretrain reasoners from 3 to 30B base models on the general Wikipedia corpus using RL, significantly breaking the verification data-wall for general reasoning. 3) Verification scaling: by tackling increasingly challenging masked spans, PretrainZero substantially enhances the general reasoning abilities of pretrained base models. In reinforcement pretraining, PretrainZero improves Qwen3-4B-Base for 8.43, 5.96 and 10.60 on MMLU-Pro, SuperGPQA and math average benchmarks. In post-training, the pretrained models can also serve as reasoning foundation models for downstream RLVR tasks.