CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen
2025-08-06
Summary
This paper talks about CompassVerifier, a lightweight and accurate model designed to check and verify the answers produced by large language models (LLMs) across different subjects and tasks.
What's the problem?
The problem is that existing methods for checking LLM answers often require a lot of complicated custom rules or repeated manual effort and are not always good at handling tricky or unusual cases across different areas.
What's the solution?
CompassVerifier solves this by using a robust verification model trained on a special dataset called VerifierBench, which includes a wide range of real LLM answers and mistakes, helping it to identify wrong or invalid answers and work well across various domains.
Why it matters?
This matters because reliable verification helps improve the trustworthiness of AI systems, supporting better evaluation and guiding LLMs to produce more accurate and useful responses.
Abstract
CompassVerifier is a lightweight, robust model for verifying LLM outputs across various domains, supported by VerifierBench, a comprehensive benchmark dataset.