TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

2025-05-23

TinyV: Reducing False Negatives in Verification Improves RL for LLM
Reasoning

Summary

This paper talks about TinyV, a small but smart tool that helps train big AI language models better by catching mistakes that older checking systems often miss.

What's the problem?

When teaching AI to reason and solve problems, the systems that check if an answer is right or wrong sometimes make errors, especially by marking good answers as bad ones, which confuses the AI and slows down its learning.

What's the solution?

The researchers created TinyV, a lightweight verifier based on language models, that does a better job of recognizing correct answers, so the AI gets more accurate feedback and can learn faster during training.

Why it matters?

This matters because it helps AI models become smarter and more reliable in less time, making them more useful for things like homework help, research, and other tasks that need good reasoning skills.

Abstract

TinyV, a lightweight LLM-based verifier, improves RL training of large language models by addressing false negatives from existing rule-based verifiers, enhancing reward accuracy and convergence speed.

View Paper