Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Seyedmorteza Sadat, Otmar Hilliges, Romann M. Weber
2024-10-04

Summary
This paper discusses a new method called Adaptive Projected Guidance (APG) that improves the performance of diffusion models by reducing oversaturation and artifacts when using high guidance scales.
What's the problem?
In diffusion models, a technique called classifier-free guidance (CFG) is used to enhance the quality of generated images. However, when the guidance scale is set too high, it can lead to oversaturation (where images look overly bright or unnatural) and other visual artifacts (unwanted features in the images). This makes the generated images less realistic and can affect their overall quality.
What's the solution?
To solve this problem, the authors analyze how CFG works and identify that certain components in the guidance process cause oversaturation. They propose a new method, APG, which reduces the influence of these problematic components while maintaining the benefits of CFG. APG allows for higher guidance scales without causing oversaturation, making it easier to generate high-quality images. The method is also easy to implement and doesn't add significant computational costs during the image generation process.
Why it matters?
This research is important because it enhances the ability of diffusion models to create realistic images while avoiding common issues like oversaturation. By improving how these models work, APG can lead to better results in applications such as art generation, photo editing, and any technology that relies on generating high-quality visual content.
Abstract
Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.