Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu

2025-02-13

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and
Uncertainty Based Routing

Summary

This paper talks about a new way to combine different AI language models into one super-smart model called Mediator. It's like taking the best parts of several smart robots and putting them together to make one really clever robot that can do many tasks well.

What's the problem?

When scientists try to merge different AI models, they often run into conflicts. It's like trying to mix oil and water - they don't always blend well. This can make the combined model perform worse than expected. Also, keeping all these different models separate takes up a lot of computer space and power, which is expensive and inefficient.

What's the solution?

The researchers created Mediator, which does a few clever things. First, it figures out which parts of different AI models work well together and combines those. For the parts that don't mix well, it uses a special system to choose the best expert for each task. They also found a way to shrink the models by separating the common knowledge from the specialized knowledge, kind of like having a general textbook and then smaller, specific guidebooks. Lastly, Mediator can figure out how sure it is about a task and choose the right experts accordingly.

Why it matters?

This matters because it could make AI systems much smarter and more efficient. By combining the strengths of different AI models without needing as much computer power, we could have more capable AI assistants that can handle a wider range of tasks. This could lead to better AI help in fields like medicine, education, and scientific research, all while using less energy and resources.

Abstract

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

View Paper