Therefore I am. I Think
Esakkivel Esakkiraja, Sai Rajeswar, Denis Akhiyarov, Rajagopal Venkatesaramani
2026-04-03
Summary
This research investigates whether large language models actually *think* through a problem before deciding on an answer, or if they make a decision first and then create a justification for it afterward.
What's the problem?
We often assume that when a language model gives a detailed, step-by-step explanation (called 'chain-of-thought' reasoning), it's genuinely working through the problem logically. However, it's unclear if the model first arrives at a conclusion and *then* generates the reasoning to support it, or if the reasoning process truly leads to the decision. The core question is: does the decision come before or after the thinking?
What's the solution?
The researchers found strong evidence that the decision is often made very early in the process, even before the model starts generating the text of its reasoning. They used a technique to 'read' the model's internal activity (activations) and accurately predict what decision the model would make, sometimes even before a single word of explanation was produced. They also directly manipulated these internal signals, and found that changing the 'decision direction' caused the model to change its answer and then adjust its reasoning to fit the new answer, rather than sticking to its original thought process.
Why it matters?
This is important because it challenges our understanding of how these powerful language models work. If models are primarily justifying pre-existing decisions rather than truly reasoning, it raises concerns about their reliability and potential for bias. It suggests that the impressive 'thinking' we see might be more of a sophisticated storytelling ability than genuine problem-solving.
Abstract
We consider the question: when a large language reasoning model makes a choice, did it think first and then decide to, or decide first and then think? In this paper, we present evidence that detectable, early-encoded decisions shape chain-of-thought in reasoning models. Specifically, we show that a simple linear probe successfully decodes tool-calling decisions from pre-generation activations with very high confidence, and in some cases, even before a single reasoning token is produced. Activation steering supports this causally: perturbing the decision direction leads to inflated deliberation, and flips behavior in many examples (between 7 - 79% depending on model and benchmark). We also show through behavioral analysis that, when steering changes the decision, the chain-of-thought process often rationalizes the flip rather than resisting it. Together, these results suggest that reasoning models can encode action choices before they begin to deliberate in text.