Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Zhenpeng Su, Leiyu Pan, Xue Bai, Dening Liu, Guanting Dong, Jiaming Huang, Wenping Hu, Guorui Zhou
2025-08-12
Summary
This paper talks about Klear-Reasoner, a new AI model that is very good at solving difficult problems that require long and careful thinking, especially in math and programming. It uses a special training process to get better at reasoning step-by-step.
What's the problem?
The problem is that many existing AI models struggle with reasoning through long and complicated problems because their training methods suppress important learning signals and don't fully use the most challenging examples. This makes it hard for them to learn to think deeply and carefully.
What's the solution?
The paper introduces a detailed training approach for Klear-Reasoner that includes long Chain-of-Thought supervised fine-tuning, where the model learns to think through problems step-by-step, and reinforcement learning with a new technique called Gradient-Preserving clipping Policy Optimization. This technique helps the model keep important information during learning, improving its ability to explore different solutions and learn from mistakes effectively.
Why it matters?
This matters because having AI models that can reason deeply and handle complex tasks improves how well AI can solve problems in areas like math and coding. Better reasoning AI can assist in education, research, and technology development by providing more accurate and thoughtful solutions.
Abstract
Klear-Reasoner, a model with long reasoning capabilities, achieves high performance across benchmarks through detailed post-training workflows, including long Chain-of-Thought supervised fine-tuning and reinforcement learning with Gradient-Preserving clipping Policy Optimization.