SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Viet Nguyen, Anh Aengus Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, Anh Tran

2024-12-05

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Summary

This paper presents SNOOPI, a new framework designed to improve one-step diffusion models used for generating images from text descriptions by enhancing guidance during training and inference.

What's the problem?

One-step diffusion models, which simplify the process of generating images from text, often struggle with stability and flexibility. They use a fixed guidance scale that can lead to inconsistent results when applied to different types of models. Additionally, they lack the ability to effectively manage negative prompts, which are important for excluding unwanted elements from generated images.

What's the solution?

SNOOPI addresses these issues by introducing two main improvements: Proper Guidance-SwiftBrush (PG-SB) and Negative-Away Steer Attention (NASA). PG-SB allows for a variable guidance scale during training, which helps stabilize the model's performance across different architectures. NASA integrates negative prompts into the model's attention mechanism, enabling it to suppress unwanted features in the generated images. These enhancements make the one-step models more robust and effective at producing high-quality images.

Why it matters?

This research is important because it advances the capabilities of AI in generating images from text prompts. By improving the stability and control of one-step diffusion models, SNOOPI sets a new standard for image generation technologies. This can lead to better applications in various fields such as digital art, advertising, and content creation, where precise and appealing image generation is essential.

Abstract

Recent approaches have yielded promising results in distilling multi-step text-to-image diffusion models into one-step ones. The state-of-the-art efficient distillation technique, i.e., SwiftBrushv2 (SBv2), even surpasses the teacher model's performance with limited resources. However, our study reveals its instability when handling different diffusion model backbones due to using a fixed guidance scale within the Variational Score Distillation (VSD) loss. Another weakness of the existing one-step diffusion models is the missing support for negative prompt guidance, which is crucial in practical image generation. This paper presents SNOOPI, a novel framework designed to address these limitations by enhancing the guidance in one-step diffusion models during both training and inference. First, we effectively enhance training stability through Proper Guidance-SwiftBrush (PG-SB), which employs a random-scale classifier-free guidance approach. By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance. Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images. Our experimental results show that our proposed methods significantly improve baseline models across various metrics. Remarkably, we achieve an HPSv2 score of 31.08, setting a new state-of-the-art benchmark for one-step diffusion models.

View Paper