Think Anywhere in Code Generation

Xue Jiang, Tianyu Zhang, Ge Li, Mengyang Liu, Taozhi Chen, Zhenhua Xu, Binhua Li, Wenpin Jiao, Zhi Jin, Yongbin Li, Yihong Dong

2026-04-01

Summary

This paper introduces a new way for large language models (LLMs) to solve coding problems. It focuses on improving how these models 'think' while writing code, making them better at handling complex tasks.

What's the problem?

Current LLMs often try to figure out the entire solution to a coding problem *before* they start writing any code. This 'think first' approach doesn't work well for coding because you often only understand the full problem as you actually try to build the solution. It's like trying to plan a whole road trip without knowing what obstacles you'll encounter along the way. Also, some parts of a coding problem are harder than others, and existing methods don't adjust how much 'thinking' they do based on the difficulty of each step.

What's the solution?

The researchers developed a method called 'Think-Anywhere' which allows the LLM to 'think' and reason *during* the code generation process, at any point while writing. They first trained the model to mimic good reasoning patterns, and then used a reward system to encourage the model to figure out *when* and *where* to use reasoning on its own. Essentially, the model learns to pause and think when it encounters a tricky part of the code.

Why it matters?

This is important because it significantly improves the ability of LLMs to write code correctly. The 'Think-Anywhere' method outperforms existing techniques on several standard coding benchmarks and works well with different types of LLMs. It also makes the model's reasoning process more transparent, showing *where* it needed to think harder, which helps us understand how these models work and improve them further.

Abstract

Recent advances in reasoning Large Language Models (LLMs) have primarily relied on upfront thinking, where reasoning occurs before final answer. However, this approach suffers from critical limitations in code generation, where upfront thinking is often insufficient as problems' full complexity only reveals itself during code implementation. Moreover, it cannot adaptively allocate reasoning effort throughout the code generation process where difficulty varies significantly. In this paper, we propose Think-Anywhere, a novel reasoning mechanism that enables LLMs to invoke thinking on-demand at any token position during code generation. We achieve Think-Anywhere by first teaching LLMs to imitate the reasoning patterns through cold-start training, then leveraging outcome-based RL rewards to drive the model's autonomous exploration of when and where to invoke reasoning. Extensive experiments on four mainstream code generation benchmarks (i.e., LeetCode, LiveCodeBench, HumanEval, and MBPP) show that Think-Anywhere achieves state-of-the-art performance over both existing reasoning methods and recent post-training approaches, while demonstrating consistent generalization across diverse LLMs. Our analysis further reveals that Think-Anywhere enables the model to adaptively invoke reasoning at high-entropy positions, providing enhanced interpretability.

View Paper