Tracking Universal Features Through Fine-Tuning and Model Merging
Niels Horn, Desmond Elliott
2024-10-17

Summary
This paper discusses a method called Neural Metamorphosis (NeuMeta), which allows neural networks to adapt and change their structure dynamically without needing to create separate models for different tasks.
What's the problem?
When building neural networks, developers usually create different models for different tasks or sizes. This can be inefficient and time-consuming because each model needs to be trained separately. Additionally, it can be hard to adapt these models to new configurations that weren't part of the original training.
What's the solution?
To solve this issue, NeuMeta learns a 'weight manifold,' which is a mathematical space where all possible configurations of a neural network's weights exist. Instead of training separate models, NeuMeta allows users to generate weights for any network size or configuration directly from this manifold. The authors also introduce a new method for stabilizing the weight space during training, which helps improve performance across various model sizes.
Why it matters?
This research is significant because it enhances the flexibility and efficiency of neural networks. By allowing a single model to adapt to different tasks and sizes easily, NeuMeta can save time and resources in developing AI systems. This advancement could lead to better performance in applications like image recognition, natural language processing, and more, making AI technology more accessible and effective.
Abstract
We study how features emerge, disappear, and persist across models fine-tuned on different domains of text. More specifically, we start from a base one-layer Transformer language model that is trained on a combination of the BabyLM corpus, and a collection of Python code from The Stack. This base model is adapted to two new domains of text: TinyStories, and the Lua programming language, respectively; and then these two models are merged using these two models using spherical linear interpolation. Our exploration aims to provide deeper insights into the stability and transformation of features across typical transfer-learning scenarios using small-scale models and sparse auto-encoders.