Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
Zihan Wang, Rui Pan, Jiarui Yao, Robert Csordas, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu
2025-06-25
Summary
This paper talks about Chain-of-Experts (CoE), a new way to improve mixture-of-experts models by letting tokens pass through several expert networks within each layer step by step.
What's the problem?
The problem is that traditional mixture-of-experts models activate only parts of the model for each input, but this can limit communication between experts and reduce performance while still using a lot of memory.
What's the solution?
The researchers improved this by designing CoE, where tokens are routed through multiple experts inside the same layer in a sequence, allowing better interaction and information sharing among experts while using memory more efficiently.
Why it matters?
This matters because it makes large AI models better at handling complex data with less memory use, leading to faster and stronger performance without needing much more computing power.
Abstract
Chain-of-Experts (CoE) improves performance and memory efficiency in mixture-of-experts models by iteratively routing tokens through experts within each layer.