The End of Manual Decoding: Towards Truly End-to-End Language Models

Zhichao Wang, Dongyang Ma, Xinting Huang, Deng Cai, Tian Lan, Jiahao Xu, Haitao Mi, Xiaoying Tang, Yan Wang

2025-10-31

The End of Manual Decoding: Towards Truly End-to-End Language Models

Summary

This paper addresses a key limitation of large language models (LLMs) – the fact that they aren't truly 'end-to-end' because getting them to generate good text requires a lot of manual tweaking of settings during the decoding process.

What's the problem?

Currently, LLMs need 'decoding strategies' to turn their internal understanding into actual text, and these strategies rely on settings like 'temperature' and 'top-p' which control randomness and creativity. The problem is that finding the *right* settings for these is a tedious, trial-and-error process, often involving 'hacking' the test data to find what works best. This means LLMs aren't fully self-contained and require significant human intervention to perform optimally.

What's the solution?

The researchers introduced a new system called AutoDeco. It adds small, extra components to the LLM that *learn* how to adjust the temperature and top-p settings automatically, on a word-by-word basis, as it generates text. Instead of a human setting these values beforehand, the model predicts them itself during the generation process, making it truly end-to-end. It essentially learns to control its own creativity and focus.

Why it matters?

This is important because it makes LLMs more efficient and easier to use. AutoDeco performs as well as, or even better than, carefully tuned manual settings, and it can even follow simple instructions like 'be less random' by adjusting its settings accordingly. This opens the door to more controllable and interactive LLMs, where users can directly influence the style and content of the generated text without needing to be experts in model settings.

Abstract

The "end-to-end" label for LLMs is a misnomer. In practice, they depend on a non-differentiable decoding process that requires laborious, hand-tuning of hyperparameters like temperature and top-p. This paper introduces AutoDeco, a novel architecture that enables truly "end-to-end" generation by learning to control its own decoding strategy. We augment the standard transformer with lightweight heads that, at each step, dynamically predict context-specific temperature and top-p values alongside the next-token logits. This approach transforms decoding into a parametric, token-level process, allowing the model to self-regulate its sampling strategy within a single forward pass. Through extensive experiments on eight benchmarks, we demonstrate that AutoDeco not only significantly outperforms default decoding strategies but also achieves performance comparable to an oracle-tuned baseline derived from "hacking the test set"-a practical upper bound for any static method. Crucially, we uncover an emergent capability for instruction-based decoding control: the model learns to interpret natural language commands (e.g., "generate with low randomness") and adjusts its predicted temperature and top-p on a token-by-token basis, opening a new paradigm for steerable and interactive LLM decoding.

View Paper