DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Yuang Ai, Qihang Fan, Xuefeng Hu, Zhenheng Yang, Ran He, Huaibo Huang
2025-05-22
Summary
This paper talks about DiCo, a new type of AI model that uses classic convolutional neural networks (ConvNets) instead of transformers to create high-quality images quickly and efficiently in diffusion-based image generation tasks.
What's the problem?
Most state-of-the-art diffusion models for generating images use transformers, which are powerful but slow and require a lot of computer resources, partly because they use a process called global self-attention that often ends up being unnecessary for making detailed images.
What's the solution?
The researchers designed DiCo by improving ConvNets with a special channel attention mechanism that keeps the model efficient while making sure it can still create diverse and realistic images, and they showed that DiCo can outperform transformer-based models in both image quality and speed.
Why it matters?
This matters because it means we can generate top-quality images much faster and with less computing power, making advanced AI image generation more practical and accessible for everyone.
Abstract
Diffusion ConvNet (DiCo), using standard ConvNet modules with a compact channel attention mechanism, achieves high image quality and generation speed in visual generation tasks with efficiency gains compared to Diffusion Transformer (DiT).