Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Rihui Xin, Han Liu, Zecheng Wang, Yupeng Zhang, Dianbo Sui, Xiaolin Hu, Bingning Wang
2025-05-27
Summary
This paper talks about a new way to train large language models to solve math problems by using clues like the format and length of their answers, instead of needing lots of correct solutions to learn from.
What's the problem?
The problem is that training AI to solve math problems usually requires a huge amount of correct answers as examples, which can be hard to get, especially for really tough or rare problems.
What's the solution?
The researchers showed that by paying attention to patterns in how answers are written—like their structure and how long they are—the AI can learn to solve math problems just as well, or even better, than models trained with tons of correct solutions.
Why it matters?
This is important because it means we can make smart math-solving AIs without needing massive amounts of perfect data, making it easier and faster to build helpful tools for students, teachers, and anyone who needs math help.
Abstract
The research demonstrates that using format and length as surrogate signals can improve LLMs' performance in mathematical problem-solving, matching or surpassing traditional methods without extensive ground truth data.