Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian

2025-12-09

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Summary

This paper explores a new way to get large language models, which are usually good with text, to make accurate numerical predictions. It does this by treating the prediction as a process of generating a sequence, like writing a sentence, but with numbers.

What's the problem?

The main issue is that large language models are trained to predict the next word in a sequence, which works well for text, but doesn't directly translate to accurately predicting continuous numbers. Existing methods try to fix this by forcing the model to follow rules at the individual 'token' level (like each digit in a number), but these methods often struggle to understand the overall size or magnitude of the number being predicted, leading to inaccuracies and difficulty generalizing to new situations.

What's the solution?

The researchers used a technique called Reinforcement Learning to train the model. They framed the number generation process as a series of decisions, and then rewarded the model when it produced a number that was close to the correct answer. This 'sequence-level' reward system, unlike focusing on individual digits, helps the model understand the big picture and generate numbers that are both accurate and consistent. They specifically used two RL algorithms, ReMax and GRPO, to achieve this.

Why it matters?

This work is important because it shows that decoding-based regression, using sequence-level rewards through reinforcement learning, can be a really effective way to use large language models for numerical prediction. It outperforms previous methods and demonstrates that this approach is reliable and precise for a wide range of numerical tasks, opening up possibilities for using these powerful models in areas that require accurate number forecasting.

Abstract

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.

View Paper