The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu

2026-04-10

The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment

Summary

This paper explores whether useful skills learned by one AI model can be 'copied' and applied to other AI models, even if those models are different sizes, without any additional training.

What's the problem?

Large language models (LLMs) are often improved after their initial training through a process called 'post-training,' where they learn specific skills. However, this post-training is expensive and time-consuming. The problem is whether we can transfer these learned skills to *other* models, especially smaller ones, without having to go through the costly post-training process again. It's like trying to teach a new student a trick that another student already knows, without having to re-teach the whole class.

What's the solution?

The researchers proposed an idea called the 'Master Key Hypothesis,' suggesting that skills are represented as specific patterns within the model's internal workings. They developed a method called UNLOCK that identifies these patterns in a model that *has* the skill, and then 'transfers' that pattern to another model that *doesn't*. This transfer doesn't involve changing the target model's weights – it's more like subtly adjusting how the model uses its existing knowledge. They do this by comparing how the model behaves when it successfully uses a skill versus when it doesn't, and then applying that difference to the new model during its operation.

Why it matters?

This research is important because it offers a way to significantly improve the performance of existing AI models, particularly smaller ones, without the need for expensive retraining. This could make advanced AI capabilities more accessible and efficient, allowing more people to benefit from powerful AI tools. The results show substantial gains in reasoning abilities like solving math problems and using 'Chain-of-Thought' reasoning, demonstrating the potential of this transfer learning approach.

Abstract

We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.

View Paper