Understanding and Enforcing Weight Disentanglement in Task Arithmetic
Shangge Liu, Yuehan Yin, Lei Wang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao, Dacheng Tao
2026-04-22
Summary
This paper investigates why a technique called 'task arithmetic' works so well for editing AI models without needing to retrain them completely. It proposes a core idea, 'Task-Feature Specialization', to explain this success and then develops a new method, 'OrthoReg', to make task arithmetic even better.
What's the problem?
Task arithmetic is a clever way to modify what AI models do, but scientists didn't fully understand *why* it worked. The idea of 'weight disentanglement' – where different tasks don't interfere with each other inside the model – described *what* happens, but not *how* it happens. Specifically, researchers didn't know what properties of the original model or the task instructions allowed for this clean separation of tasks.
What's the solution?
The researchers introduced the concept of 'Task-Feature Specialization' (TFS), which means the model dedicates specific parts of its internal workings to different tasks. They proved that if a model has TFS, it will naturally show weight disentanglement. They also discovered that TFS leads to a measurable pattern: the 'weight vectors' for different tasks become perpendicular (orthogonal) to each other. Because TFS is hard to directly control, they created 'OrthoReg', a technique that *forces* these weight vectors to be orthogonal during model updates. They mathematically showed that making the weight updates orthogonal promotes disentanglement and improves performance.
Why it matters?
Understanding why task arithmetic works allows us to improve it and apply it more reliably. By identifying TFS and developing OrthoReg, this research provides a theoretical foundation for task arithmetic and a practical method to enhance its effectiveness. This means we can more easily and efficiently customize AI models for new tasks without the huge cost of full retraining, which is a big step forward in making AI more adaptable and accessible.
Abstract
Task arithmetic provides an efficient, training-free way to edit pre-trained models, yet lacks a fundamental theoretical explanation for its success. The existing concept of ``weight disentanglement" describes the ideal outcome of non-interfering task composition but does not reveal its underlying cause. Crucially, what intrinsic properties of the pre-trained model (θ_0) or the task vectors (τ_t) enable this disentanglement remains underexplored. In this paper, we introduce Task-Feature Specialization (TFS), a model's ability to allocate distinct internal features to different tasks, as the fundamental principle. We first prove that TFS is a sufficient condition for weight disentanglement. More importantly, we find that TFS also gives rise to an observable geometric consequence: weight vector orthogonality. This positions TFS as the common cause for both the desired functional outcome (disentanglement) and a measurable geometric property (orthogonality). This relationship provides the key insight for our method: since the abstract TFS property is intractable to enforce directly, we can instead promote weight disentanglement by shaping its concrete geometric consequence, orthogonality. Therefore, we propose OrthoReg, a simple and effective regularization method that actively enforces an internal orthogonal structure on weight updates (ΔW) that constitute τ_t during fine-tuning. And we theoretically prove that OrthoReg promotes disentanglement. Extensive experiments demonstrate that OrthoReg consistently and significantly enhances the performance of various task arithmetic methods. Code is available at https://github.com/RL-MIND/OrthoReg{https://github.com/RL-MIND/OrthoReg}.