Do LLMs "Feel"? Emotion Circuits Discovery and Control
Chenxi Wang, Yixuan Zhang, Ruiji Yu, Yufei Zheng, Lang Gao, Zirui Song, Zixiang Xu, Gus Xia, Huishuai Zhang, Dongyan Zhao, Xiuying Chen
2025-10-20
Summary
This research investigates how large language models, or LLMs, handle and express emotions in the text they generate. It aims to figure out if LLMs have built-in ways of processing emotions, what those ways look like internally, and if we can control the emotions expressed in their output.
What's the problem?
As LLMs become more sophisticated, people want them to not just be factually correct, but also emotionally intelligent – meaning they can understand and express emotions appropriately. However, it's a mystery how LLMs *actually* create emotional responses in their text. We don't know if they have specific internal mechanisms for emotion, and even if they do, we don't know how to control them. Existing methods like simply telling the model to 'be happy' aren't very reliable.
What's the solution?
The researchers created a special dataset designed to test how LLMs respond to different emotional scenarios. They then analyzed the inner workings of the LLM, specifically looking at individual neurons and attention mechanisms, to identify parts that consistently relate to specific emotions. They essentially mapped out 'emotion circuits' within the model. Finally, they directly adjusted these circuits to control the emotional tone of the generated text, achieving very high accuracy in expressing the desired emotion.
Why it matters?
This study is important because it's the first to systematically identify and validate how emotions are processed inside LLMs. This gives us a much better understanding of how these models work, making them more interpretable. More importantly, it provides a way to reliably control the emotional content of the text they generate, which is crucial for building truly emotionally intelligent AI systems that can communicate effectively and appropriately.
Abstract
As the demand for emotional intelligence in large language models (LLMs) grows, a key challenge lies in understanding the internal mechanisms that give rise to emotional expression and in controlling emotions in generated text. This study addresses three core questions: (1) Do LLMs contain context-agnostic mechanisms shaping emotional expression? (2) What form do these mechanisms take? (3) Can they be harnessed for universal emotion control? We first construct a controlled dataset, SEV (Scenario-Event with Valence), to elicit comparable internal states across emotions. Subsequently, we extract context-agnostic emotion directions that reveal consistent, cross-context encoding of emotion (Q1). We identify neurons and attention heads that locally implement emotional computation through analytical decomposition and causal analysis, and validate their causal roles via ablation and enhancement interventions. Next, we quantify each sublayer's causal influence on the model's final emotion representation and integrate the identified local components into coherent global emotion circuits that drive emotional expression (Q2). Directly modulating these circuits achieves 99.65% emotion-expression accuracy on the test set, surpassing prompting- and steering-based methods (Q3). To our knowledge, this is the first systematic study to uncover and validate emotion circuits in LLMs, offering new insights into interpretability and controllable emotional intelligence.