Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning
Jiuzhou Han, Wray Buntine, Ehsan Shareghi
2025-08-05
Summary
This paper talks about a new approach that uses uncertainty-based methods to automatically create and combine rewards for each step in mathematical reasoning tasks, helping models learn better how to solve problems step-by-step.
What's the problem?
The problem is that traditional reward systems only give feedback on the final answer, making it hard to guide models during the reasoning process, and manually creating reward data for each reasoning step is time-consuming and inefficient.
What's the solution?
The paper introduces an uncertainty-driven framework that automatically constructs reward data for intermediate reasoning steps and combines these rewards effectively, improving the training of Process-Level Reward Models that supervise each step of reasoning.
Why it matters?
This matters because guiding AI models throughout the whole reasoning process helps them find more accurate and logical solutions in math problems and other complex tasks, making AI smarter and more reliable.
Abstract
An uncertainty-driven framework for automated process reward data construction and aggregation methods improves the effectiveness and efficiency of Process-Level Reward Models in mathematical reasoning tasks.