Think While You Generate: Discrete Diffusion with Planned Denoising

Sulin Liu, Juno Nam, Andrew Campbell, Hannes Stärk, Yilun Xu, Tommi Jaakkola, Rafael Gómez-Bombarelli

2024-10-14

Think While You Generate: Discrete Diffusion with Planned Denoising

Summary

This paper introduces a new method called Discrete Diffusion with Planned Denoising (DDPD), which improves how models generate text by separating the generation process into two parts: planning and denoising.

What's the problem?

Generating high-quality text or images from models can be challenging, especially when dealing with complex scenes that involve multiple objects. Traditional methods often struggle to produce detailed results because they don't effectively manage the order in which they fix errors during the generation process.

What's the solution?

The DDPD framework uses a two-step approach. First, a planner model decides which parts of the output need fixing based on how corrupted they are. Then, a denoiser model works on those specific areas in an optimal order. This planning helps the model focus on the most important corrections first, leading to better overall results. The experiments show that DDPD performs better than older methods on various benchmarks, making it a more effective way to generate complex outputs.

Why it matters?

This research is significant because it enhances the capabilities of models in generating high-quality content, particularly for complex tasks. By improving how models plan and correct their outputs, DDPD can lead to advancements in fields like natural language processing and computer graphics, where detailed and accurate generation is crucial.

Abstract

Discrete diffusion has achieved state-of-the-art performance, outperforming or approaching autoregressive models on standard benchmarks. In this work, we introduce Discrete Diffusion with Planned Denoising (DDPD), a novel framework that separates the generation process into two models: a planner and a denoiser. At inference time, the planner selects which positions to denoise next by identifying the most corrupted positions in need of denoising, including both initially corrupted and those requiring additional refinement. This plan-and-denoise approach enables more efficient reconstruction during generation by iteratively identifying and denoising corruptions in the optimal order. DDPD outperforms traditional denoiser-only mask diffusion methods, achieving superior results on language modeling benchmarks such as text8, OpenWebText, and token-based generation on ImageNet 256 times 256. Notably, in language modeling, DDPD significantly reduces the performance gap between diffusion-based and autoregressive methods in terms of generative perplexity. Code is available at https://github.com/liusulin/DDPD.

View Paper