Model Merging with Functional Dual Anchors

Kexuan Shi, Yandong Wen, Weiyang Liu

2025-10-27

Model Merging with Functional Dual Anchors

Summary

This paper introduces a new technique called Functional Dual Anchors, or FDAs, for combining different versions of a large AI model that have been trained for specific tasks. It's a way to get the benefits of multiple specialized models into one, without starting the training process all over again.

What's the problem?

When you have a powerful AI model and fine-tune it for different jobs, like translating languages or writing stories, simply averaging the changes made for each job doesn't always work well. The different updates can clash and actually make the combined model worse than the individual ones. Existing methods try to fix this by adjusting the model's internal settings, but they struggle when those settings aren't perfectly consistent between the different versions.

What's the solution?

Instead of directly changing the model's settings, FDAs focus on what the model *sees* as input. They create special, artificial inputs that, when used, nudge the model in the right direction for each specific task. Think of it like finding the perfect 'prompt' to activate the knowledge gained from each task. These artificial inputs capture how each task changes the model's behavior, and combining them leads to a more robust and flexible merged model. They also show that this method works well *with* existing techniques that adjust the model's settings.

Why it matters?

This is important because it provides a more reliable and adaptable way to combine AI models. It means we can leverage the expertise gained from training models on many different tasks without losing performance. This could lead to more versatile AI systems that can handle a wider range of problems, and it's more efficient than retraining a model from scratch every time you want to add a new skill.

Abstract

Model merging is an efficient post-training strategy for integrating knowledge from multiple finetuned checkpoints of a shared foundation model. Existing methods operate in the parameter space, combining task vectors to mitigate conflicts, but remain constrained by parameter inconsistencies. We propose Functional Dual Anchors (FDAs), a framework that instead models the input-representation space. FDAs are synthetic inputs whose induced gradients align with task vectors, capturing task-specific functional shifts relative to the pretrained model. This perspective bridges joint multi-task training and post-hoc merging, offering both robustness and flexibility. We further introduce a principled initialization scheme and show that FDAs are complementary to parameter-space model merging. Comprehensive experiments demonstrate the effectiveness of FDAs in model merging.

View Paper