HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration
Desen Sun, Jason Hon, Jintao Zhang, Sihang Liu
2026-03-16
Summary
This paper introduces a new way to speed up the process of creating images from text using diffusion models, which are known for their high-quality results but also their slow speed.
What's the problem?
While diffusion models are great at making images from text, they require a lot of computing power, especially the bigger, more detailed models. Previous attempts to speed things up only focused on reducing computation at certain points in the image creation process, and didn't consider that some parts of an image are harder to render than others, needing more processing time.
What's the solution?
The researchers developed a method called HybridStitch that treats image generation like editing an image. It divides the image into two areas: simpler parts and complex parts. A smaller, faster model quickly creates a rough version of the whole image, and then the larger, more powerful model focuses on refining only the complex areas that need more detail. This combines the speed of a smaller model with the quality of a larger one.
Why it matters?
HybridStitch significantly speeds up image generation – in their tests, it was 1.83 times faster than other methods that try to combine different models. This means we could potentially get high-quality images from text much faster, making this technology more accessible and practical for everyday use.
Abstract
Diffusion models have demonstrated a remarkable ability in Text-to-Image (T2I) generation applications. Despite the advanced generation output, they suffer from heavy computation overhead, especially for large models that contain tens of billions of parameters. Prior work has illustrated that replacing part of the denoising steps with a smaller model still maintains the generation quality. However, these methods only focus on saving computation for some timesteps, ignoring the difference in compute demand within one timestep. In this work, we propose HybridStitch, a new T2I generation paradigm that treats generation like editing. Specifically, we introduce a hybrid stage that jointly incorporates both the large model and the small model. HybridStitch separates the entire image into two regions: one that is relatively easy to render, enabling an early transition to the smaller model, and another that is more complex and therefore requires refinement by the large model. HybridStitch employs the small model to construct a coarse sketch while exploiting the large model to edit and refine the complex regions. According to our evaluation, HybridStitch achieves 1.83times speedup on Stable Diffusion 3, which is faster than all existing mixture of model methods.