The Superposition of Diffusion Models Using the Itô Density Estimator
Marta Skreta, Lazar Atanackovic, Avishek Joey Bose, Alexander Tong, Kirill Neklyudov
2024-12-30

Summary
This paper talks about a new method called SuperDiff, which allows different pre-trained diffusion models to be combined effectively without needing to retrain a larger model from scratch.
What's the problem?
As more pre-trained diffusion models become available, there is a growing need for ways to combine these models to generate better results. However, combining them usually requires a lot of computational power and time to retrain a new model, which can be impractical for many users. Additionally, existing methods often do not efficiently utilize the strengths of these different models during the generation process.
What's the solution?
To address this issue, the authors propose the Superposition framework, which combines multiple pre-trained diffusion models at the generation stage. They developed two new algorithms that use a technique called the Itô density estimator to efficiently estimate probabilities without adding extra computational costs. This allows SuperDiff to work with large models and perform combinations seamlessly during inference. The method also mimics traditional logical operations, like AND and OR, making it intuitive to use.
Why it matters?
This research is important because it makes it easier for users to leverage multiple diffusion models without needing extensive resources or time. By improving how these models can work together, SuperDiff can enhance applications in areas like image generation and editing, leading to more diverse and high-quality outputs in creative fields such as art and design.
Abstract
The Cambrian explosion of easily accessible pre-trained diffusion models suggests a demand for methods that combine multiple different pre-trained diffusion models without incurring the significant computational burden of re-training a larger combined model. In this paper, we cast the problem of combining multiple pre-trained diffusion models at the generation stage under a novel proposed framework termed superposition. Theoretically, we derive superposition from rigorous first principles stemming from the celebrated continuity equation and design two novel algorithms tailor-made for combining diffusion models in SuperDiff. SuperDiff leverages a new scalable It\^o density estimator for the log likelihood of the diffusion SDE which incurs no additional overhead compared to the well-known Hutchinson's estimator needed for divergence calculations. We demonstrate that SuperDiff is scalable to large pre-trained diffusion models as superposition is performed solely through composition during inference, and also enjoys painless implementation as it combines different pre-trained vector fields through an automated re-weighting scheme. Notably, we show that SuperDiff is efficient during inference time, and mimics traditional composition operators such as the logical OR and the logical AND. We empirically demonstrate the utility of using SuperDiff for generating more diverse images on CIFAR-10, more faithful prompt conditioned image editing using Stable Diffusion, and improved unconditional de novo structure design of proteins. https://github.com/necludov/super-diffusion