Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Yiping Wang, Qing Yang, Zhiyuan Zeng, Liliang Ren, Lucas Liu, Baolin Peng, Hao Cheng, Xuehai He, Kuan Wang, Jianfeng Gao, Weizhu Chen, Shuohang Wang, Simon Shaolei Du, Yelong Shen

2025-04-30

Reinforcement Learning for Reasoning in Large Language Models with One
Training Example

Summary

This paper talks about a new way to train large language models to get much better at solving math problems by using reinforcement learning, even when they only have one example to learn from.

What's the problem?

The problem is that language models often struggle with math reasoning, and teaching them usually requires lots of examples and feedback, which can be slow and expensive.

What's the solution?

The researchers used reinforcement learning, where the model gets a reward for correct answers, and they made sure the reward could be checked for accuracy. Amazingly, they showed that the model could improve its math skills a lot even if it only saw one training example.

Why it matters?

This matters because it means AI can learn complex skills like math reasoning much faster and with less data, making it more efficient and practical for real-world use in education, science, and technology.

Abstract

Reinforcement learning with verifiable reward using one training example significantly enhances math reasoning capabilities of large language models.

View Paper