A Noise is Worth Diffusion Guidance
Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, Kyong Hwan Jin, Seungryong Kim
2024-12-06
Summary
This paper discusses a new approach to improve diffusion models for generating high-quality images by refining noise instead of relying on guidance methods.
What's the problem?
Diffusion models are great at creating images, but they often need additional guidance, like classifier-free guidance (CFG), to produce reliable results. This reliance on guidance can complicate the image generation process and may not always be necessary.
What's the solution?
The researchers propose a method called 'Noise is Worth Diffusion Guidance,' which focuses on refining the initial noise used in the diffusion process. They discovered that by transforming Gaussian noise into what they call 'guidance-free noise,' they can enhance the image generation process without needing extra guidance. This new method allows for high-quality image generation using just a single refinement of the initial noise, making the process faster and more efficient. The model learns effectively with only 50,000 text-image pairs, showing that it can produce better images without relying on additional guidance.
Why it matters?
This research is important because it simplifies the image generation process in diffusion models, making them more efficient and easier to use. By eliminating the need for complex guidance methods, this approach can lead to faster image generation and lower resource usage, which is beneficial for various applications in AI, such as art generation, video game design, and more.
Abstract
Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.