Artificial Entanglement in the Fine-Tuning of Large Language Models

Min Chen, Zihan Wang, Canyu Chen, Zeguan Wu, Manling Li, Junyu Liu

2026-01-13

Artificial Entanglement in the Fine-Tuning of Large Language Models

Summary

This paper investigates why a specific technique for quickly adapting large language models, called parameter-efficient fine-tuning (PEFT), actually works so well. They look at it through the lens of quantum physics, specifically how information is structured and connected.

What's the problem?

Large language models are powerful, but fully retraining them for each new task is incredibly expensive and time-consuming. PEFT methods offer a solution by only updating a small portion of the model's parameters, but it's not entirely clear *why* this limited updating still leads to good performance. The researchers wanted to understand the underlying structure of these updates and how they relate to the model's overall ability to learn.

What's the solution?

The researchers used concepts from quantum information theory, particularly something called 'entanglement,' to analyze the changes made during PEFT. They focused on a popular PEFT method called LoRA and compared it to fully retraining the model. They measured something they call 'artificial entanglement' within the model's parameters and how information flows between different parts of the model during processing. They found that LoRA creates a unique pattern of internal connections, but surprisingly, the final output of the model – how it understands and responds to text – looks very similar whether you use LoRA or fully retrain it. They likened this to a 'no-hair' theorem in physics, suggesting that the specific internal details of LoRA don't ultimately matter as long as the output remains consistent.

Why it matters?

This research provides a theoretical understanding of why PEFT methods are effective. By connecting these techniques to fundamental principles of information theory, it could lead to the development of even more efficient and effective ways to adapt large language models to new tasks, making them more accessible and practical for a wider range of applications. It also suggests that there's a robustness to these methods, meaning they're less sensitive to specific settings and still perform well.

Abstract

Large language models (LLMs) can be adapted to new tasks using parameter-efficient fine-tuning (PEFT) methods that modify only a small number of trainable parameters, often through low-rank updates. In this work, we adopt a quantum-information-inspired perspective to understand their effectiveness. From this perspective, low-rank parameterizations naturally correspond to low-dimensional Matrix Product States (MPS) representations, which enable entanglement-based characterizations of parameter structure. Thereby, we term and measure "Artificial Entanglement", defined as the entanglement entropy of the parameters in artificial neural networks (in particular the LLMs). We first study the representative low-rank adaptation (LoRA) PEFT method, alongside full fine-tuning (FFT), using LLaMA models at the 1B and 8B scales trained on the Tulu3 and OpenThoughts3 datasets, and uncover: (i) Internal artificial entanglement in the updates of query and value projection matrices in LoRA follows a volume law with a central suppression (termed as the "Entanglement Valley"), which is sensitive to hyper-parameters and is distinct from that in FFT; (ii) External artificial entanglement in attention matrices, corresponding to token-token correlations in representation space, follows an area law with logarithmic corrections and remains robust to LoRA hyper-parameters and training steps. Drawing a parallel to the No-Hair Theorem in black hole physics, we propose that although LoRA and FFT induce distinct internal entanglement signatures, such differences do not manifest in the attention outputs, suggesting a "no-hair" property that results in the effectiveness of low rank updates. We further provide theoretical support based on random matrix theory, and extend our analysis to an MPS Adaptation PEFT method, which exhibits qualitatively similar behaviors.

View Paper