Skywork Open Reasoner 1 Technical Report

Jujie He, Jiacai Liu, Chris Yuhao Liu, Rui Yan, Chaojie Wang, Peng Cheng, Xiaoyu Zhang, Fuxiang Zhang, Jiacheng Xu, Wei Shen, Siyuan Li, Liang Zeng, Tianwen Wei, Cheng Cheng, Bo An, Yang Liu, Yahui Zhou

2025-05-29

Skywork Open Reasoner 1 Technical Report

Summary

This paper talks about Skywork-OR1, a new approach that uses reinforcement learning to help AI models reason through problems step by step, making them more accurate than previous models like DeepSeek-R1.

What's the problem?

The problem is that when large language models are trained to solve complex problems using chain-of-thought reasoning, they sometimes become too confident in their answers too quickly, which causes them to stop exploring different ways to solve the problem. This issue, known as entropy collapse, limits the model's ability to find the best solution and can hurt its overall performance.

What's the solution?

To solve this, the researchers designed Skywork-OR1 to use reinforcement learning in a way that keeps the model exploring different reasoning paths instead of settling on one answer too soon. By addressing entropy collapse, Skywork-OR1 is able to improve its accuracy and outperform other models on various benchmarks.

Why it matters?

This is important because it means AI systems can become better at handling complicated tasks that require logical, step-by-step thinking, making them more reliable and useful for things like math, science, and decision-making.

Abstract

Skywork-OR1 is a reinforcement learning approach for long Chain-of-Thought models that improves accuracy over DeepSeek-R1 across various benchmarks by addressing entropy collapse.

View Paper