DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma

2026-01-09

DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Summary

This paper introduces a new method called DiffCoT to improve how large language models solve complex math problems that require multiple steps of reasoning.

What's the problem?

While large language models are getting better at solving multi-step problems by 'thinking through' the steps (Chain-of-Thought reasoning), they still struggle with two main issues. First, they tend to copy the style of the examples they were trained on, which can lead to incorrect answers if the problem is slightly different. Second, if they make a mistake early in the process, that error gets carried forward and ruins the rest of the solution because the model builds each step on the previous one.

What's the solution?

DiffCoT tackles these problems by treating the reasoning process like gradually refining a blurry image. Instead of generating each step of the solution directly, it starts with a noisy, incomplete version and then iteratively 'denoises' it, improving each step. It does this using a 'sliding window' that focuses on a small part of the reasoning at a time, allowing it to both create new steps and correct previous ones. Importantly, the method ensures that the reasoning steps still flow logically from one to the next, maintaining a clear cause-and-effect relationship.

Why it matters?

This research is important because it makes these language models more reliable and accurate when solving complex problems. By improving their ability to correct mistakes and avoid simply mimicking training examples, DiffCoT helps unlock the potential of these models for tasks that require careful, step-by-step reasoning, like advanced mathematics or scientific problem-solving.

Abstract

Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive decoding. In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while preserving token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.

View Paper