Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Euisoo Jung, Byunghyun Kim, Hyunjin Kim, Seonghye Cho, Jae-Gil Lee

2026-02-27

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Summary

This paper focuses on making diffusion models, which are really good at creating realistic images, videos, and audio, work much faster. Currently, generating content with these models takes a lot of computing power.

What's the problem?

While people have tried using multiple computer graphics cards (GPUs) to speed things up, these methods often result in noticeable flaws in the generated content and don't achieve the speed increases you'd expect from adding more GPUs. Essentially, simply splitting the work isn't enough to get a good balance between speed and quality.

What's the solution?

The researchers developed a new system that combines two main ideas. First, they cleverly split up the work based on how the diffusion model creates images, separating the parts that need specific guidance from the parts that are more general. Second, they designed a system that automatically switches between different ways of distributing the work across GPUs, choosing the most efficient method depending on how much the guided and general parts of the image creation process differ. This 'hybrid parallelism' approach adapts to the specific task at hand.

Why it matters?

This work is important because it significantly speeds up the generation process of high-quality images with diffusion models, making them more practical for real-world applications. They showed speed improvements on popular models like SDXL and SD3, and their method works well even when creating very high-resolution images, outperforming other existing acceleration techniques. This means faster creation of detailed and realistic content.

Abstract

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves 2.31times and 2.07times latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.

View Paper