Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention
Susung Hong
2024-08-02

Summary
This paper introduces Smoothed Energy Guidance (SEG), a new method for improving how diffusion models generate images by optimizing the way they process attention, ultimately leading to better image quality and fewer unwanted effects.
What's the problem?
Diffusion models, which are used for generating visual content, have made great strides but still face challenges when it comes to producing high-quality images without artifacts (unwanted visual errors). Current methods that extend guidance to these models often rely on trial-and-error techniques that don't always work well, resulting in lower quality images.
What's the solution?
To solve these issues, the authors propose SEG, which enhances image generation by focusing on the energy of the self-attention mechanism used in these models. SEG reduces the complexity of how attention works by adjusting certain parameters, allowing for more effective image generation. Additionally, it introduces a method called query blurring that simplifies the attention process without slowing down performance. Through experiments, SEG has shown significant improvements in both image quality and reduced side effects compared to existing methods.
Why it matters?
This research is important because it helps advance the field of image generation, making it possible to create clearer and more accurate images with fewer errors. By improving how diffusion models work, SEG can be applied in various areas such as art creation, video game design, and virtual reality, enhancing the overall experience for users and creators alike.
Abstract
Conditional diffusion models have shown remarkable success in visual content generation, producing high-quality samples across various domains, largely due to classifier-free guidance (CFG). Recent attempts to extend guidance to unconditional models have relied on heuristic techniques, resulting in suboptimal generation quality and unintended effects. In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. By defining the energy of self-attention, we introduce a method to reduce the curvature of the energy landscape of attention and use the output as the unconditional prediction. Practically, we control the curvature of the energy landscape by adjusting the Gaussian kernel parameter while keeping the guidance scale parameter fixed. Additionally, we present a query blurring method that is equivalent to blurring the entire attention weights without incurring quadratic complexity in the number of tokens. In our experiments, SEG achieves a Pareto improvement in both quality and the reduction of side effects. The code is available at https://github.com/SusungHong/SEG-SDXL.