xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Ding Chen, Qingchen Yu, Pengyuan Wang, Wentao Zhang, Bo Tang, Feiyu Xiong, Xinchi Li, Minchuan Yang, Zhiyu Li

2025-04-16

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Summary

This paper talks about xVerify, a new tool designed to check if the answers given by AI reasoning models are correct, especially for questions that have clear, objective answers.

What's the problem?

The problem is that as AI models get better at answering questions that require reasoning, it's getting harder and more time-consuming for people to check if these answers are actually right. Existing methods for verifying answers are either too slow, not accurate enough, or can't handle a wide range of question types.

What's the solution?

The researchers developed xVerify, which is a faster and more reliable way to judge if an AI's answer is correct. They tested xVerify on different types of objective questions and found that it did a better job than other tools, with higher accuracy and F1 scores, meaning it was both precise and consistent in its judgments.

Why it matters?

This matters because having a trustworthy and efficient way to check AI answers helps researchers improve their models more quickly and ensures that the information people get from these systems is actually correct. It also saves time and effort compared to manual checking, making it easier to use AI in real-world situations where accuracy is important.

Abstract

xVerify, an efficient answer verifier, demonstrates strong equivalence judgment capabilities in evaluating reasoning models across various objective questions, with F1 scores and accuracy surpassing other methods.

View Paper