Autonomy-of-Experts Models

Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

2025-01-23

Summary

This paper talks about a new way to make AI models more efficient and effective, called Autonomy-of-Experts (AoE). It's like creating a team of AI specialists who can decide for themselves when they're the best person for a job, instead of having a manager assign tasks.

What's the problem?

Current AI models that use a method called Mixture-of-Experts (MoE) have a 'router' that decides which part of the AI (called an expert) should handle each task. This is like having a boss who doesn't really know what each employee is best at, making decisions about who should do what. This can lead to tasks being assigned to the wrong experts, and the experts not learning as well as they could.

What's the solution?

The researchers created AoE, where instead of having a router, each expert in the AI decides for itself if it's good at a particular task. It's like letting employees choose their own projects based on what they know they're good at. The experts quickly check how well they might handle a task, and only the best ones actually do the work. To make this quick check efficient, they use a clever math trick to simplify the process.

Why it matters?

This matters because it could make AI systems work better and faster. By letting the parts of the AI that are best suited for a task handle it, we can get better results without needing more computing power. This could lead to smarter AI assistants, more efficient language translation, and better performance in all kinds of AI tasks. It's a step towards making AI not just bigger, but smarter in how it uses its resources, which could help create more advanced AI systems that can handle complex tasks more effectively.

Abstract

Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy-of-Experts (AoE), a novel MoE paradigm in which experts autonomously select themselves to process inputs. AoE is based on the insight that an expert is aware of its own capacity to effectively process a token, an awareness reflected in the scale of its internal activations. In AoE, routers are removed; instead, experts pre-compute internal activations for inputs and are ranked based on their activation norms. Only the top-ranking experts proceed with the forward pass, while the others abort. The overhead of pre-computing activations is reduced through a low-rank weight factorization. This self-evaluating-then-partner-comparing approach ensures improved expert selection and effective learning. We pre-train language models having 700M up to 4B parameters, demonstrating that AoE outperforms traditional MoE models with comparable efficiency.

View Paper