Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Tong Zheng, Hongming Zhang, Wenhao Yu, Xiaoyang Wang, Xinyu Yang, Runpeng Dai, Rui Liu, Huiwen Bao, Chengsong Huang, Heng Huang, Dong Yu

2025-09-10

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Summary

This paper introduces a new method, called Parallel-R1, to help large language models (LLMs) become better at complex reasoning by thinking through problems in multiple ways at the same time, rather than one step after another.

What's the problem?

Large language models struggle with complicated reasoning tasks. Current attempts to teach them to think in parallel – exploring different solution paths simultaneously – rely on showing them perfect examples, which doesn't encourage them to actually *learn* how to explore and find solutions on their own. It’s like memorizing answers instead of understanding the process.

What's the solution?

The researchers developed Parallel-R1, a system that uses a combination of techniques. First, they give the model some initial training with good examples to get it started with parallel thinking. Then, they switch to a 'trial and error' learning process called reinforcement learning, where the model gets rewarded for finding correct answers. Importantly, they start with easier problems and gradually increase the difficulty, helping the model learn to explore effectively and then use parallel thinking to double-check its work. This staged approach overcomes the initial difficulty of getting the model to even *try* different reasoning paths.

Why it matters?

This work is significant because it shows a way to genuinely teach LLMs to think more flexibly and thoroughly. By enabling parallel thinking, the model doesn't just improve its accuracy on challenging math problems, but also learns to use this ability strategically – first for exploration, then for verification. This could lead to more robust and reliable AI systems capable of tackling complex real-world problems.

Abstract

Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose Parallel-R1, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully instills parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on challenging tasks with RL. Further analysis reveals a clear shift in the model's thinking behavior: at an early stage, it uses parallel thinking as an exploration strategy, while in a later stage, it uses the same capability for multi-perspective verification. Most significantly, we validate parallel thinking as a mid-training exploration scaffold, where this temporary exploratory phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the baseline on AIME25. Our model, data, and code will be open-source at https://github.com/zhengkid/Parallel-R1.

View Paper