No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
Liyan Xu, Mo Yu, Fandong Meng, Jie Zhou
2026-02-04
Summary
This research explores how large language models, like those powering chatbots, 'think' when solving problems, specifically when they use a technique called 'Chain-of-Thought' reasoning. It investigates what's happening *inside* the model before it even starts to explain its reasoning step-by-step.
What's the problem?
While it's known that large language models can perform well with Chain-of-Thought, it wasn't clear how much of that ability comes from pre-existing planning within the model itself versus relying on the step-by-step explanation process. The researchers wanted to understand the extent to which these models actually plan ahead versus just reacting to the current step in a problem.
What's the solution?
The researchers developed a method called 'Tele-Lens' to peek into the hidden states of the language model – essentially, looking at the model's internal 'thoughts' – across different types of tasks. They found that the models tend to have a short-sighted view, making decisions incrementally rather than creating a detailed plan from the start. Based on this, they proposed that only a few key steps in the Chain-of-Thought process are needed to understand how confident the model is in its answer, and they showed this to be true. They also demonstrated that the model can sometimes skip the Chain-of-Thought explanation without losing accuracy.
Why it matters?
This work is important because understanding how these models reason helps us build more reliable and trustworthy AI. By knowing that they don't necessarily plan far ahead, we can improve their reasoning abilities and better estimate when they might make mistakes. It also suggests ways to make these models more efficient by streamlining the reasoning process without sacrificing performance.
Abstract
This work stems from prior complementary observations on the dynamics of Chain-of-Thought (CoT): Large Language Models (LLMs) is shown latent planning of subsequent reasoning prior to CoT emergence, thereby diminishing the significance of explicit CoT; whereas CoT remains critical for tasks requiring multi-step reasoning. To deepen the understanding between LLM's internal states and its verbalized reasoning trajectories, we investigate the latent planning strength of LLMs, through our probing method, Tele-Lens, applying to hidden states across diverse task domains. Our empirical results indicate that LLMs exhibit a myopic horizon, primarily conducting incremental transitions without precise global planning. Leveraging this characteristic, we propose a hypothesis on enhancing uncertainty estimation of CoT, which we validate that a small subset of CoT positions can effectively represent the uncertainty of the entire path. We further underscore the significance of exploiting CoT dynamics, and demonstrate that automatic recognition of CoT bypass can be achieved without performance degradation. Our code, data and models are released at https://github.com/lxucs/tele-lens.