Attention Heads of Large Language Models: A Survey
Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Bo Tang, Feiyu Xiong, Zhiyu Li
2024-09-06

Summary
This paper talks about a survey on attention heads in large language models (LLMs), focusing on how these models understand and process information.
What's the problem?
Large language models, like ChatGPT, are very powerful but often act like 'black boxes,' meaning we don't fully understand how they work internally. This lack of understanding makes it hard to improve their performance or fix issues because researchers can't see how the models reason or make decisions.
What's the solution?
The authors of the paper aim to explore the inner workings of LLMs by studying attention heads, which are key components that help the model focus on important parts of the input data. They created a framework based on human thinking processes, breaking it down into four stages: recalling knowledge, identifying context, reasoning, and preparing to express ideas. By reviewing existing research, they categorize how different attention heads function and summarize methods used to study them. They also discuss the limitations of current research and suggest future directions for study.
Why it matters?
This research is important because it helps demystify how large language models operate, which can lead to better designs and improvements in AI. Understanding attention heads can improve how we build and use these models, making them more reliable and effective for various tasks such as answering questions or generating text.
Abstract
Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various tasks but remain largely as black-box systems. Consequently, their development relies heavily on data-driven approaches, limiting performance enhancement through changes in internal architecture and reasoning pathways. As a result, many researchers have begun exploring the potential internal mechanisms of LLMs, aiming to identify the essence of their reasoning bottlenecks, with most studies focusing on attention heads. Our survey aims to shed light on the internal reasoning processes of LLMs by concentrating on the interpretability and underlying mechanisms of attention heads. We first distill the human thought process into a four-stage framework: Knowledge Recalling, In-Context Identification, Latent Reasoning, and Expression Preparation. Using this framework, we systematically review existing research to identify and categorize the functions of specific attention heads. Furthermore, we summarize the experimental methodologies used to discover these special heads, dividing them into two categories: Modeling-Free methods and Modeling-Required methods. Also, we outline relevant evaluation methods and benchmarks. Finally, we discuss the limitations of current research and propose several potential future directions. Our reference list is open-sourced at https://github.com/IAAR-Shanghai/Awesome-Attention-Heads.