Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Yike Yuan, Ziyu Wang, Zihao Huang, Defa Zhu, Xun Zhou, Jingyi Yu, Qiyang Min

2025-03-21

Expert Race: A Flexible Routing Strategy for Scaling Diffusion
Transformer with Mixture of Experts

Summary

This paper is about making AI image generators better by letting different parts of the AI specialize in different tasks and compete to handle the most important parts of the image.

What's the problem?

AI image generators can struggle to create high-quality images efficiently because they process all parts of the image in the same way.

What's the solution?

The researchers developed a new method called Expert Race that allows different parts of the AI to specialize and compete to handle different parts of the image, leading to better results.

Why it matters?

This work matters because it can lead to AI image generators that are faster, more efficient, and produce higher-quality images.

Abstract

Diffusion models have emerged as mainstream framework in visual generation. Building upon this success, the integration of Mixture of Experts (MoE) methods has shown promise in enhancing model scalability and performance. In this paper, we introduce Race-DiT, a novel MoE model for diffusion transformers with a flexible routing strategy, Expert Race. By allowing tokens and experts to compete together and select the top candidates, the model learns to dynamically assign experts to critical tokens. Additionally, we propose per-layer regularization to address challenges in shallow layer learning, and router similarity loss to prevent mode collapse, ensuring better expert utilization. Extensive experiments on ImageNet validate the effectiveness of our approach, showcasing significant performance gains while promising scaling properties.

View Paper