PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Jingcheng Hu, Yinmin Zhang, Shijie Shang, Xiaobo Yang, Yue Peng, Zhewei Huang, Hebin Zhou, Xin Wu, Jie Cheng, Fanqi Wan, Xiangwen Kong, Chengyuan Yao, Kaiwen Yan, Ailin Huang, Hongyu Zhou, Qi Han, Zheng Ge, Daxin Jiang, Xiangyu Zhang, Heung-Yeung Shum

2026-01-13

PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning

Summary

This paper introduces a new way to make large language models think through problems more thoroughly, called Parallel Coordinated Reasoning or PaCoRe.

What's the problem?

Current language models struggle when a problem requires a lot of thinking – specifically, they can only process a limited amount of information at once, and trying to extend that thinking process takes too much computing power and quickly hits a limit. They essentially think step-by-step, and that's slow and restrictive for complex tasks.

What's the solution?

PaCoRe tackles this by letting the model explore many different lines of reasoning *at the same time*. It's like having a team of thinkers working on a problem and then sharing their insights. Each 'thinker' generates ideas, those ideas are summarized into short messages, and then those messages guide the next round of thinking. This happens repeatedly, allowing the model to effectively consider a huge amount of information without exceeding its memory limits. The model learns how to do this through a special type of training called reinforcement learning.

Why it matters?

This is important because it allows language models to solve much harder problems, especially in areas like math, where deep reasoning is crucial. The model described in the paper actually outperforms even very advanced models like GPT-5 on a challenging math competition, demonstrating that scaling up the 'thinking' process can lead to significant improvements in performance.

Abstract

We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale test-time compute (TTC) far beyond sequential reasoning under a fixed context window. PaCoRe departs from the traditional sequential paradigm by driving TTC through massive parallel exploration coordinated via a message-passing architecture in multiple rounds. Each round launches many parallel reasoning trajectories, compacts their findings into context-bounded messages, and synthesizes these messages to guide the next round and ultimately produce the final answer. Trained end-to-end with large-scale, outcome-based reinforcement learning, the model masters the synthesis abilities required by PaCoRe and scales to multi-million-token effective TTC without exceeding context limits. The approach yields strong improvements across diverse domains, and notably pushes reasoning beyond frontier systems in mathematics: an 8B model reaches 94.5% on HMMT 2025, surpassing GPT-5's 93.2% by scaling effective TTC to roughly two million tokens. We open-source model checkpoints, training data, and the full inference pipeline to accelerate follow-up work.

View Paper