Implicit Reasoning in Transformers is Reasoning through Shortcuts

Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang

2025-03-12

Implicit Reasoning in Transformers is Reasoning through Shortcuts

Summary

This paper talks about how AI language models solve math problems by using quick shortcuts instead of step-by-step thinking, especially when variables are used in tricky ways like subtractions.

What's the problem?

AI models can solve simple math problems quickly but often fail at harder ones (like equations with variables being subtracted) because they rely on memorized patterns instead of true logical reasoning.

What's the solution?

Researchers trained a GPT-2 model on special math problems and found that AI only learns proper reasoning if trained on very specific patterns, but real-world training data (with mixed patterns) forces AI to take shortcuts that fail on new problems.

Why it matters?

This helps explain why AI struggles with complex reasoning tasks and suggests better training methods are needed to make AI truly think step-by-step instead of guessing based on patterns.

Abstract

Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.

View Paper