Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Zorik Gekhman, Roee Aharoni, Eran Ofek, Mor Geva, Roi Reichart, Jonathan Herzig

2026-03-11

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Summary

This paper investigates why getting large language models (LLMs) to 'think out loud' – that is, generate reasoning steps – actually helps them answer even simple questions, even when those questions don't *need* step-by-step thinking.

What's the problem?

It seems strange that asking an LLM to explain its reasoning would improve its ability to recall basic facts. If a question is straightforward, like 'What is the capital of France?', why would generating extra text about related topics help it get the right answer? The researchers wanted to understand this counterintuitive phenomenon and figure out *how* reasoning helps with simple fact recall.

What's the solution?

The researchers ran a bunch of experiments where they carefully controlled what the LLM was asked to generate during its 'reasoning' process. They discovered two main things happening: first, the LLM seems to use the generated text as a kind of scratchpad for internal calculations, even if the text itself doesn't make logical sense. Second, generating facts related to the question acts like a memory cue, helping the model retrieve the correct answer from its existing knowledge. However, they also found that if the LLM makes up facts during this reasoning process, it's more likely to make up the final answer too.

Why it matters?

This research is important because it gives us a better understanding of how LLMs actually work. Knowing that reasoning can act as a computational buffer and a retrieval cue, but also carries the risk of hallucinations, allows us to build better models. Specifically, we can design systems that encourage accurate reasoning paths and avoid generating false information, ultimately leading to more reliable and trustworthy AI.

Abstract

While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.

View Paper