No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces

Daniel Marczak, Simone Magistri, Sebastian Cygert, Bartłomiej Twardowski, Andrew D. Bagdanov, Joost van de Weijer

2025-02-10

No Task Left Behind: Isotropic Model Merging with Common and
Task-Specific Subspaces

Summary

This paper talks about a new way to combine different AI models into one super-model that can do multiple tasks well, without losing the special skills of each original model.

What's the problem?

When scientists try to merge multiple AI models that are each good at specific tasks into one big model, the combined model often doesn't perform as well as the individual models did on their own tasks. This performance drop has been a big challenge in creating versatile AI systems.

What's the solution?

The researchers came up with a clever method called 'isotropic merging' that looks at how different parts of the models align and work together. They flatten out the importance of different components in the models, making sure no single part dominates. They also separate the parts of the models that are common to all tasks from the parts that are specific to each task. This helps the merged model keep the special abilities of each original model while also sharing what's common.

Why it matters?

This matters because it allows us to create more efficient and capable AI systems that can handle many different tasks without needing separate models for each one. It saves computer resources, makes AI more versatile, and opens up possibilities for AI to be used in more complex real-world situations where multiple skills are needed. Plus, it helps advance our understanding of how AI models can work together, which is crucial for developing smarter, more flexible AI in the future.

Abstract

Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices -- weight update matrices applied to a pre-trained model -- that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance across multiple scenarios, including various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .

View Paper