RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Qiaoyu Tang, Hao Xiang, Le Yu, Bowen Yu, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun, Junyang Lin

2025-07-23

RefCritic: Training Long Chain-of-Thought Critic Models with Refinement
Feedback

Summary

This paper talks about RefCritic, a special AI module that helps train other AI models to think better by giving feedback and refining their answers using reinforcement learning.

What's the problem?

AI models sometimes generate answers that are not very good at reasoning or explaining their thoughts clearly, and traditional training methods often don't improve this enough.

What's the solution?

The researchers created RefCritic, which uses a reinforcement learning approach combined with two sets of rule-based rewards to teach AI models how to critique their own reasoning and improve their answers step-by-step.

Why it matters?

This matters because it helps AI models give better, more logical and refined answers, improving their usefulness in areas that need careful thinking and explanation.

Abstract

RefCritic, a reinforcement learning-based critic module with dual rule-based rewards, enhances model critique and refinement capabilities, demonstrating superior performance across multiple benchmarks compared to supervised fine-tuning methods.

View Paper