DivMerge: A divergence-based model merging method for multi-tasking

Touayouch Brahim, Fosse Loïc, Damnati Géraldine, Lecorvé Gwénolé

2025-09-09

DivMerge: A divergence-based model merging method for multi-tasking

Summary

This paper explores a way to combine the knowledge from multiple AI models, each trained to do a different job, into a single, powerful model.

What's the problem?

When you try to merge AI models trained on different tasks, they can sometimes interfere with each other, causing performance to drop, especially as you add more and more tasks. It's like trying to learn multiple subjects at once and getting them confused. Existing methods for merging models don't always work well when dealing with a large number of tasks.

What's the solution?

The researchers developed a new method for merging models that uses a mathematical concept called Jensen-Shannon divergence. This helps the models blend together smoothly without needing any new training data. The method also automatically figures out how important each task is, ensuring that no single task dominates the merged model. Essentially, it's a smart way to combine the strengths of each individual model.

Why it matters?

This research is important because it allows us to create more versatile AI systems that can handle a wider range of tasks efficiently. Instead of training a separate model for every single job, we can combine existing models, saving time and resources. This is particularly useful as we continue to develop more and more specialized AI models.

Abstract

Multi-task learning (MTL) is often achieved by merging datasets before fine-tuning, but the growing availability of fine-tuned models has led to new approaches such as model merging via task arithmetic. A major challenge in this setting is task interference, which worsens as the number of tasks increases. We propose a method that merges models trained on different tasks into a single model, maintaining strong performance across all tasks. Our approach leverages Jensen-Shannon divergence to guide the merging process without requiring additional labelled data, and automatically balances task importance. Unlike existing methods, our approach remains robust as the number of tasks grows and consistently outperforms prior work.

View Paper