An uncertainty-driven framework for automated process reward data construction and aggregation methods improves the effectiveness and efficiency of Process-Level Reward Models in mathematical reasoning tasks.

This paper talks about a new approach that uses uncertainty-based methods to automatically create and combine rewards for each step in mathematical reasoning tasks, helping models learn better how to solve problems step-by-step.

Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract