SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Zichong Li, Chen Liang, Zixuan Zhang, Ilgee Hong, Young Jin Kim, Weizhu Chen, Tuo Zhao

2025-06-24

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming
and Distillation

Summary

This paper talks about SlimMoE, a method to shrink large Mixture of Experts (MoE) models into smaller, faster versions while keeping almost the same performance.

What's the problem?

The problem is that MoE models, which have many specialized parts called experts, usually require a lot of computing power and resources, making them hard to run on smaller machines or with limited energy.

What's the solution?

The researchers developed a multi-step compression technique that slims down the experts and uses knowledge distillation to transfer the original model's skills to the smaller version without fully retraining it from scratch.

Why it matters?

This matters because it makes powerful AI models more accessible and efficient, allowing them to run faster and on less powerful devices without losing too much accuracy.

Abstract

SlimMoE compresses large MoE models into smaller, efficient variants using multi-stage compression without full retraining, maintaining competitive performance with significantly fewer resources.

View Paper