Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion
Zhongjie Duan, Hong Zhang, Yingda Chen
2026-04-30
Summary
This paper introduces Diffusion Templates, a new system designed to make it easier to control how images are created using diffusion models.
What's the problem?
Currently, different methods for controlling diffusion models are built separately for each specific model, making them incompatible with each other. This means you can't easily reuse tools across different image generation models, transfer control techniques between them, or combine multiple controls in one go. It's like having a different remote control for every TV, sound system, and DVD player – it's messy and inefficient.
What's the solution?
The researchers created Diffusion Templates, which acts like a universal adapter. It has three main parts: models that translate your instructions into a format the diffusion model understands, a 'cache' that stores these instructions in a standardized way, and a 'pipeline' that loads and applies these instructions to the image generation process. This system works with different types of control methods, like adjusting image details or changing the style, all under one framework.
Why it matters?
This work is important because it simplifies and unifies the process of controlling diffusion models. It allows researchers and artists to easily share and combine different control techniques, and it makes it easier to apply these techniques to new and improved image generation models as they are developed. Ultimately, it makes creating exactly the image you want much more accessible and flexible.
Abstract
Controllable diffusion methods have substantially expanded the practical utility of diffusion models, but they are typically developed as isolated, backbone-specific systems with incompatible training pipelines, parameter formats, and runtime hooks. This fragmentation makes it difficult to reuse infrastructure across tasks, transfer capabilities across backbones, or compose multiple controls within a single generation pipeline. We present Diffusion Templates, a unified and open plugin framework that decouples base-model inference from controllable capability injection. The framework is organized around three components: Template models that map arbitrary task-specific inputs to an intermediate capability representation, a Template cache that functions as a standardized interface for capability injection, and a Template pipeline that loads, merges, and injects one or more Template caches into the base diffusion runtime. Because the interface is defined at the systems level rather than tied to a specific control architecture, heterogeneous capability carriers such as KV-Cache and LoRA can be supported under the same abstraction. Based on this design, we build a diverse model zoo spanning structural control, brightness adjustment, color adjustment, image editing, super-resolution, sharpness enhancement, aesthetic alignment, content reference, local inpainting, and age control. These case studies show that Diffusion Templates can unify a broad range of controllable generation tasks while preserving modularity, composability, and practical extensibility across rapidly evolving diffusion backbones. All resources will be open sourced, including code, models, and datasets.