Training Step-Level Reasoning Verifiers with Formal Verification Tools
Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang
2025-05-23
Summary
This paper talks about a new approach called FoVer that uses formal verification tools to help train AI models to check their reasoning step by step, making them more accurate and reliable.
What's the problem?
The problem is that when AI models try to solve complex problems, they can make mistakes at different steps, and it's hard to catch these errors because labeling each step by hand is slow and not always consistent.
What's the solution?
The researchers developed FoVer, which automatically marks where mistakes happen in each reasoning step using formal verification tools. These automatic labels are then used to train Process Reward Models, which helps the AI get better at spotting and fixing its own errors across different types of tasks.
Why it matters?
This matters because it allows AI to learn from its mistakes much more efficiently and improves its ability to generalize to new problems, making it more dependable for solving tough reasoning challenges.
Abstract
FoVer is a method for automatically annotating step-level error labels using formal verification tools to train Process Reward Models, which significantly improves cross-task generalization and outperforms human-annotated methods in various reasoning benchmarks.