Temporal Consistency for LLM Reasoning Process Error Identification

Jiacheng Guo, Yue Wu, Jiahao Qiu, Kaixuan Huang, Xinzhe Juan, Ling Yang, Mengdi Wang

2025-03-19

Temporal Consistency for LLM Reasoning Process Error Identification

Summary

This paper introduces a new way to check if AI is making mistakes when solving math problems.

What's the problem?

It's hard to make sure AI is correct when it's doing math because it can make mistakes in the steps it takes to solve the problem.

What's the solution?

The researchers created a system where the AI checks its own work multiple times, learning from its past checks to get better at finding errors.

Why it matters?

This work is important because it can help make AI more reliable at math, which is useful in many fields like science, engineering, and finance.

Abstract

Verification is crucial for effective mathematical reasoning. We present a new temporal consistency method where verifiers iteratively refine their judgments based on the previous assessment. Unlike one-round verification or multi-model debate approaches, our method leverages consistency in a sequence of self-reflection actions to improve verification accuracy. Empirical evaluations across diverse mathematical process error identification benchmarks (Mathcheck, ProcessBench, and PRM800K) show consistent performance improvements over baseline methods. When applied to the recent DeepSeek R1 distilled models, our method demonstrates strong performance, enabling 7B/8B distilled models to outperform all 70B/72B models and GPT-4o on ProcessBench. Notably, the distilled 14B model with our method achieves performance comparable to Deepseek-R1. Our codes are available at https://github.com/jcguo123/Temporal-Consistency

View Paper