Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Haoyuan Wu, Haoxing Chen, Xiaodong Chen, Zhanchao Zhou, Tieyuan Chen, Yihong Zhuang, Guoshan Lu, Zenan Huang, Junbo Zhao, Lin Liu, Zhenzhong Lan, Bei Yu, Jianguo Li

2025-08-12

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Summary

This paper talks about Grove MoE, a new version of the Mixture of Experts (MoE) model that uses experts of different sizes and activates only the important parts based on how complicated the input is. This makes large language models both faster and better at understanding and generating text.

What's the problem?

The problem is that large language models can be very slow and use a lot of computing power because they activate all their parts for every task, even when the task is simple. Also, traditional MoE models usually treat all experts as the same size, which can be inefficient because some inputs need more or less attention.

What's the solution?

Grove MoE solves this by creating a system with heterogeneous experts, meaning the experts come in different sizes and abilities. It dynamically chooses which experts to activate depending on how complex the input is, so simpler inputs use smaller experts and more complex inputs use bigger experts. This adaptive process makes the model more efficient without losing performance.

Why it matters?

This matters because it helps AI models work faster and smarter by using just the right amount of computing power for each task. This improvement can make AI more accessible, reduce energy use, and allow bigger models to work on more devices or handle harder problems without slowing down.

Abstract

Grove MoE, a novel architecture with heterogeneous experts of varying sizes, improves computational efficiency and performance in large language models by dynamically activating parameters based on input complexity.

View Paper