OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Junhan Zhu, Hesong Wang, Mingluo Su, Zefang Wang, Huan Wang

2025-10-09

OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot

Summary

This paper introduces a new method, OBS-Diff, for making large image-generating AI models more efficient without significantly sacrificing image quality.

What's the problem?

Creating images with AI using diffusion models requires a lot of computing power, making it expensive and slow, especially for complex models, and existing methods to simplify these models don't work well because of how diffusion models gradually refine images step-by-step.

What's the solution?

OBS-Diff tackles this by intelligently removing unnecessary parts of the AI model, a process called pruning, in a single step without needing to retrain the model, it does this by carefully analyzing how errors build up during the image creation process and prioritizing the most important parts of the model to keep, and it uses a clever strategy to make the pruning process faster.

Why it matters?

This research is important because it allows for faster and cheaper image generation with AI, making these powerful tools more accessible and practical for a wider range of applications, and it sets a new standard for how effectively these models can be compressed.

Abstract

Large-scale text-to-image diffusion models, while powerful, suffer from prohibitive computational cost. Existing one-shot network pruning methods can hardly be directly applied to them due to the iterative denoising nature of diffusion models. To bridge the gap, this paper presents OBS-Diff, a novel one-shot pruning framework that enables accurate and training-free compression of large-scale text-to-image diffusion models. Specifically, (i) OBS-Diff revitalizes the classic Optimal Brain Surgeon (OBS), adapting it to the complex architectures of modern diffusion models and supporting diverse pruning granularity, including unstructured, N:M semi-structured, and structured (MHA heads and FFN neurons) sparsity; (ii) To align the pruning criteria with the iterative dynamics of the diffusion process, by examining the problem from an error-accumulation perspective, we propose a novel timestep-aware Hessian construction that incorporates a logarithmic-decrease weighting scheme, assigning greater importance to earlier timesteps to mitigate potential error accumulation; (iii) Furthermore, a computationally efficient group-wise sequential pruning strategy is proposed to amortize the expensive calibration process. Extensive experiments show that OBS-Diff achieves state-of-the-art one-shot pruning for diffusion models, delivering inference acceleration with minimal degradation in visual quality.

View Paper