VerifiAgent: a Unified Verification Agent in Language Model Reasoning
Jiuzhou Han, Wray Buntine, Ehsan Shareghi
2025-04-03
Summary
This paper is about creating a tool that helps AI language models give more reliable and accurate answers by checking their work.
What's the problem?
AI language models are good at reasoning, but they often make mistakes or give answers that aren't trustworthy. Existing ways to check their work are limited and can't be used for all types of reasoning tasks.
What's the solution?
The researchers developed VerifiAgent, a tool that checks AI responses in two ways: first, it makes sure the answer is complete and consistent; second, it uses different tools depending on the type of reasoning involved, like math, logic, or common sense.
Why it matters?
This work matters because it can make AI language models more reliable and accurate, which is important for applications where people rely on AI for information and decision-making.
Abstract
Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent