Delta Activations: A Representation for Finetuned Large Language Models

Zhiqiu Xu, Amish Sethi, Mayur Naik, Ser-Nam Lim

2025-09-05

Delta Activations: A Representation for Finetuned Large Language Models

Summary

This paper introduces a new way to understand and organize the many different versions of large language models that people have created after starting with a basic, open-source model.

What's the problem?

There are now tons of specialized language models available, but it's really hard to figure out what each one does or how they differ from each other because the information about them is messy and scattered. It's like having a huge library with no cataloging system.

What's the solution?

The researchers came up with a method called 'Delta Activations'. Basically, they look at how a model's internal workings *change* when it's fine-tuned for a specific task. These changes are turned into a kind of 'fingerprint' – a vector embedding – that represents the model. These fingerprints allow them to group similar models together, showing how the model landscape is structured. They found this method works consistently and even combines well when models are trained on multiple datasets.

Why it matters?

This work is important because it makes it much easier to find and reuse existing language models. Instead of starting from scratch every time, researchers and developers can quickly identify models that are already good at the task they need, saving time and resources. It helps unlock the potential of all the publicly available models that have been created.

Abstract

The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models. Code is available at https://github.com/OscarXZQ/delta_activations.

View Paper