SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong, Yang Wang, Hengyuan Zhang, Yelong Shen, Ying Nian Wu, Weizhu Chen
2025-06-16
Summary
This paper talks about SwS, a new framework that helps large language models (LLMs) get better at reasoning by making them aware of their own weaknesses. It creates new problems based on the model’s mistakes and uses these to train the model with reinforcement learning, giving rewards that can be checked for correctness to improve learning.
What's the problem?
The problem is that large language models often make errors in reasoning, but traditional training methods don’t focus enough on the model’s weak spots or provide clear feedback on where it goes wrong. This makes it hard for the models to improve reasoning skills efficiently and reliably because they aren’t learning directly from their mistakes.
What's the solution?
The solution is SwS, which identifies the model’s weaknesses by generating new problems that target those mistakes. Then it uses reinforcement learning where the model gets verifiable rewards based on whether it solves these self-created problems correctly. This approach makes training focused on fixing weaknesses and provides clear signals that help the model improve its reasoning abilities step-by-step.
Why it matters?
This matters because reasoning is a key skill for AI models to solve complex tasks accurately. By making models aware of their own weak points and training them with clear, verifiable feedback, SwS helps build stronger, smarter AI that can think better and make fewer errors, benefiting lots of applications like decision-making, problem-solving, and understanding complicated information.
Abstract
A self-aware problem synthesis framework that leverages model weaknesses enhances reinforcement learning with verifiable rewards, improving large language model performance on reasoning tasks.