CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen

2025-08-06

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and
Outcome Reward

Summary

This paper talks about CompassVerifier, a lightweight and accurate model designed to check and verify the answers produced by large language models (LLMs) across different subjects and tasks.

What's the problem?

The problem is that existing methods for checking LLM answers often require a lot of complicated custom rules or repeated manual effort and are not always good at handling tricky or unusual cases across different areas.

What's the solution?

CompassVerifier solves this by using a robust verification model trained on a special dataset called VerifierBench, which includes a wide range of real LLM answers and mistakes, helping it to identify wrong or invalid answers and work well across various domains.

Why it matters?

This matters because reliable verification helps improve the trustworthiness of AI systems, supporting better evaluation and guiding LLMs to produce more accurate and useful responses.

Abstract

CompassVerifier is a lightweight, robust model for verifying LLM outputs across various domains, supported by VerifierBench, a comprehensive benchmark dataset.

View Paper