DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

2024-06-24

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Summary

This paper introduces a new technique called DELLA-Merging, which improves the process of combining multiple machine learning models into one. It focuses on reducing interference during this merging process by using a method that ranks and selectively drops less important model parameters.

What's the problem?

As more specialized models are developed for different tasks, combining these models into a single multitasking model can lead to problems. When models are merged, they can interfere with each other, causing performance issues. Existing methods for merging do not effectively manage this interference, leading to suboptimal results.

What's the solution?

The authors propose DELLA-Merging, which uses a pruning technique named MAGPRUNE. This method first ranks the parameters (the parts of the model) based on their importance, measured by their magnitude. Parameters that are less important are more likely to be dropped during the merging process. After dropping some parameters, MAGPRUNE rescales the remaining ones to maintain the model's overall performance. In tests with three different expert models (for language, math, and coding tasks), DELLA-Merging showed better performance compared to previous methods.

Why it matters?

This research is significant because it provides a more effective way to merge machine learning models, which can enhance their ability to perform multiple tasks without needing extensive retraining. By improving model merging techniques, developers can create more efficient and capable AI systems that utilize existing specialized models better.

Abstract

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

View Paper