Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang

2025-04-21

Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?

Summary

This paper talks about whether using reinforcement learning (RL) actually makes large language models better at reasoning, or if it just changes how they pick their answers.

What's the problem?

The problem is that some researchers have claimed that training language models with RL, especially by rewarding correct answers, gives these models new reasoning skills that they didn’t have before. This has led to confusion about what RL is really doing to the models.

What's the solution?

The researchers looked closely at how RL with verifiable rewards affects language models and found that it doesn’t actually make the models smarter or give them new reasoning abilities. Instead, RL just helps the models pick answers that are more likely to be correct by focusing on the types of responses that get rewarded, but it doesn’t expand what the models are truly capable of reasoning about.

Why it matters?

This matters because it clears up a big misunderstanding in the AI field. Knowing that RL doesn’t magically make models better at reasoning helps researchers focus on finding new ways to actually improve AI’s thinking skills, instead of just tweaking how they choose their answers.

Abstract

Despite initial claims, reinforcement learning with verifiable rewards does not introduce fundamentally new reasoning abilities to LLMs, instead it enhances performance by biasing output distribution toward rewarded paths without expanding the reasoning boundary.

View Paper