Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

Hosein Hasani, Mohammadali Banayeeanzade, Ali Nafisi, Sadegh Mohammadian, Fatemeh Askari, Mobin Bagherian, Amirmohammad Izadi, Mahdieh Soleymani Baghshah

2026-01-07

Mechanistic Interpretability of Large-Scale Counting in LLMs through a System-2 Strategy

Summary

This paper investigates why large language models, while good at many complex tasks, struggle with simple counting, and proposes a way to help them count more accurately.

What's the problem?

Large language models sometimes fail at counting tasks, especially when the numbers get bigger. This isn't because they lack intelligence, but because of how they're built. These models, called transformers, process information in layers, and counting relies on information being passed accurately through all those layers. The deeper the model goes, the harder it is to keep track of the count, limiting how high they can accurately count.

What's the solution?

The researchers came up with a trick inspired by how humans solve problems – breaking down a big task into smaller, manageable steps. They didn't change the model itself, but instead instructed it to count in stages. For example, instead of counting all the objects at once, the model first counts groups of objects, remembers those smaller counts, and then adds them together. They then analyzed *how* the model was doing this, finding that it was storing partial counts and using specific parts of its attention mechanism to transfer information between steps.

Why it matters?

This work is important because it helps us understand *why* these powerful models sometimes make seemingly simple mistakes. By figuring out the mechanism behind this 'step-by-step' counting strategy, we can improve the reasoning abilities of these models and make them more reliable for tasks that require precise calculations or tracking of quantities. It also gives us insight into how these models might be mimicking human thought processes.

Abstract

Large language models (LLMs), despite strong performance on complex mathematical problems, exhibit systematic limitations in counting tasks. This issue arises from architectural limits of transformers, where counting is performed across layers, leading to degraded precision for larger counting problems due to depth constraints. To address this limitation, we propose a simple test-time strategy inspired by System-2 cognitive processes that decomposes large counting tasks into smaller, independent sub-problems that the model can reliably solve. We evaluate this approach using observational and causal mediation analyses to understand the underlying mechanism of this System-2-like strategy. Our mechanistic analysis identifies key components: latent counts are computed and stored in the final item representations of each part, transferred to intermediate steps via dedicated attention heads, and aggregated in the final stage to produce the total count. Experimental results demonstrate that this strategy enables LLMs to surpass architectural limitations and achieve high accuracy on large-scale counting tasks. This work provides mechanistic insight into System-2 counting in LLMs and presents a generalizable approach for improving and understanding their reasoning behavior.

View Paper