Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Tianyu Fu, Yichen You, Zekai Chen, Guohao Dai, Huazhong Yang, Yu Wang

2025-11-19

Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models

Summary

This paper focuses on improving the reasoning abilities of large language models, those powerful AI systems that generate text, without making them significantly larger or more complex.

What's the problem?

Large language models sometimes struggle with complex reasoning tasks. A previous attempt to improve them involved having the model re-think its answers multiple times, going through extra steps for each word it generates. However, researchers found a surprising issue: the model would sometimes *correct* answers that were already right on the first try, actually making them wrong through overthinking. This is a problem because it wastes processing power and reduces accuracy.

What's the solution?

The researchers developed a new method called 'Think-at-Hard' (TaH). Instead of making the model re-think *every* word, TaH uses a small 'decider' within the model to identify only the words it's likely to get wrong. It then focuses its extra thinking steps on those difficult words. To help with this focused thinking, they also adjusted how the model pays attention to information across these re-thinking steps, allowing it to learn from previous attempts without slowing down the process. They used a technique called LoRA to make these adjustments with very few extra parameters.

Why it matters?

This research is important because it shows how to make large language models better at reasoning without drastically increasing their size or computational cost. By only focusing on the parts the model struggles with, TaH significantly improves accuracy on challenging tasks while remaining efficient. This makes these powerful AI tools more practical for real-world applications where resources are limited.

Abstract

Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Prior work proposes recurrent transformers, which allocate a fixed number of extra iterations per token to improve generation quality. After the first, standard forward pass, instead of verbalization, last-layer hidden states are fed back as inputs for additional iterations to refine token predictions. Yet we identify a latent overthinking phenomenon: easy token predictions that are already correct after the first pass are sometimes revised into errors in additional iterations. To address this, we propose Think-at-Hard (TaH), a dynamic latent thinking method that iterates deeper only at hard tokens. It employs a lightweight neural decider to trigger latent iterations only at tokens that are likely incorrect after the standard forward pass. During latent iterations, Low-Rank Adaptation (LoRA) modules shift the LLM objective from general next-token prediction to focused hard-token refinement. We further introduce a duo-causal attention mechanism that extends attention from the token sequence dimension to an additional iteration depth dimension. This enables cross-iteration information flow while maintaining full sequential parallelism. Experiments show that TaH boosts LLM reasoning performance across five challenging benchmarks while maintaining the same parameter count. Compared with baselines that iterate twice for all output tokens, TaH delivers 8.1-11.3% accuracy gains while exempting 94% of tokens from the second iteration. Against strong single-iteration Qwen3 models finetuned with the same data, it also delivers 4.0-5.0% accuracy gains. When allowing less than 3% additional parameters from LoRA and the iteration decider, the gains increase to 8.5-12.6% and 5.3-5.4%, respectively. Our code is available at https://github.com/thu-nics/TaH.

View Paper