Large Language Models Explore by Latent Distilling
Yuanhao Zeng, Ao Lu, Lufei Li, Zheng Zhang, Yexin Li, Kan Ren
2026-04-30
Summary
This paper introduces a new method called Exploratory Sampling (ESamp) to make large language models (LLMs) generate more diverse and creative responses, going beyond just changing a few words here and there.
What's the problem?
Large language models are great at generating text, but when you ask them to come up with multiple different answers to the same question, they often just produce variations that are very similar to each other. This limits their usefulness for tasks where you really need a range of ideas, like problem-solving or creative writing. Standard methods for adding randomness don't really solve this because they mostly change the wording without exploring truly different concepts.
What's the solution?
The researchers noticed that LLMs are more confident (make fewer errors) when processing information similar to what they've seen before, and less confident when faced with something new. They used this idea to train a small 'Distiller' model that tries to predict what's happening inside the LLM as it generates text. This Distiller learns to identify when the LLM is venturing into 'uncharted territory' – generating something less predictable. ESamp then uses this 'novelty signal' to encourage the LLM to explore those less-familiar ideas, effectively biasing it towards more diverse outputs. Importantly, this Distiller is trained *while* the LLM is generating text, allowing it to adapt to the specific context of the current generation, and it doesn't slow things down much.
Why it matters?
This work is important because it offers a way to significantly improve the diversity of LLM outputs without sacrificing the quality or coherence of the generated text. It boosts performance on challenging tasks like math, science, and coding, and allows for more creative and varied writing. By breaking the trade-off between diversity and quality, ESamp makes LLMs more useful for a wider range of applications where exploring different possibilities is key.
Abstract
Generating diverse responses is crucial for test-time scaling of large language models (LLMs), yet standard stochastic sampling mostly yields surface-level lexical variation, limiting semantic exploration. In this paper, we propose Exploratory Sampling (ESamp), a decoding approach that explicitly encourages semantic diversity during generation. ESamp is motivated by the well-known observation that neural networks tend to make lower-error predictions on inputs similar to those encountered before, and incur higher prediction error on novel ones. Building on this property, we train a lightweight Distiller at test time to predict deep-layer hidden representations of the LLM from its shallow-layer representations to model the LLM's depth-wise representation transitions. During decoding, the Distiller continuously adapts to the mappings induced by the current generation context. ESamp uses the prediction error as a novelty signal to reweight candidate token extensions conditioned on the current prefix, thereby biasing decoding toward less-explored semantic patterns. ESamp is implemented with an asynchronous training--inference pipeline, with less than 5% worst case overhead (1.2% in the optimized release). Empirical results show that ESamp significantly boosts the Pass@k efficiency of reasoning models, showing superior or comparable performance to strong stochastic and heuristic baselines. Notably, ESamp achieves robust generalization across mathematics, science, and code generation benchmarks and breaks the trade-off between diversity and coherence in creative writing. Our code has released at: https://github.com/LinesHogan/tLLM.