Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling
Zhen Zhang, Changyi Yang, Zijie Xia, Zhen Yang, Chengzhi Liu, Zhaotiao Weng, Yepeng Liu, Haobo Chen, Jin Pan, Chenyang Zhao, Yuheng Bu, Alkesh Patel, Zhe Gan, Xin Eric Wang
2026-05-01
Summary
This paper introduces a new method called the Length Value Model, or LenVM, which helps large language models (like those powering chatbots) better understand and control how long their responses are. It's about making these models more efficient and accurate when generating text.
What's the problem?
Currently, large language models struggle with precisely controlling the length of the text they generate. They often work at a very general level, deciding the overall length of a response, but not really thinking about it token by token. This leads to issues with both cost – longer responses take more computing power – and performance, as the ideal length can be crucial for good reasoning. Existing methods don't give a detailed, step-by-step understanding of how length impacts the generation process.
What's the solution?
The researchers developed LenVM, which assigns a 'value' to each individual token (word piece) generated by the model, representing how much 'length' remains in the response. Think of it like a countdown timer. Each token 'costs' a little bit of the remaining length. This is done by giving a small negative reward for each token created, which helps the model learn to predict how many tokens are left. Importantly, this method doesn't require any extra labeled data; it learns from the process of generating text itself.
Why it matters?
This work is important because it significantly improves the ability of language models to generate responses of a specific length. Experiments show it boosts performance on tasks requiring exact length matching and allows for a better trade-off between accuracy and efficiency. It also provides insights into *why* a model chooses a certain length, offering a way to understand its reasoning process and potentially improve future models through reinforcement learning. Ultimately, LenVM makes these powerful models more controllable and useful.
Abstract
Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.