LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization
Xingyu Wu, Yuchen Yan, Shangke Lyu, Linjuan Wu, Yiwen Qiu, Yongliang Shen, Weiming Lu, Jian Shao, Jun Xiao, Yueting Zhuang
2025-07-25
Summary
This paper talks about LAPO, a new method that helps large language models think more efficiently by allowing them to decide how much effort to spend on solving different problems.
What's the problem?
Large language models often use too many words and take a long time explaining simple problems, which wastes computing power and makes them slower.
What's the solution?
The researchers created LAPO, which trains the model to learn how long it should reason based on the problem's difficulty. The model first studies successful problem-solving lengths, then uses this knowledge to adapt its reasoning length on the fly.
Why it matters?
This matters because it lets AI models use less computing power while getting better answers, making them faster and smarter in handling various tasks.
Abstract
Length-Adaptive Policy Optimization (LAPO) reduces token usage and improves accuracy by enabling models to internally manage reasoning depth through reinforcement learning.