UMoE: Unifying Attention and FFN with Shared Experts

Yuanhang Yang, Chaozheng Wang, Jing Li

2025-05-13

UMoE: Unifying Attention and FFN with Shared Experts

Summary

This paper talks about UMoE, a new way of designing AI models that lets different parts of the model share special expert components, making the whole system smarter and more efficient.

What's the problem?

The problem is that most AI models use separate expert modules for different tasks, like paying attention to important information or processing data, which can waste resources and limit how well the model can learn.

What's the solution?

The researchers created a new architecture where both the attention part and the feed-forward part of the model use the same set of expert modules. This shared approach lets the model make better use of its resources and improves its overall performance.

Why it matters?

This matters because it allows AI models to be more powerful and efficient at the same time, which can lead to faster, smarter, and more cost-effective AI systems for all kinds of applications.

Abstract

A novel Sparse Mixture of Experts (MoE) architecture unifies MoE designs in attention and feed-forward layers, enhancing model performance and enabling efficient parameter sharing.

View Paper