How Do Large Language Models Learn Concepts During Continual Pre-Training?
Barry Menglong Yao, Sha Li, Yunzhi Yao, Minqian Liu, Zaishuo Xia, Qifan Wang, Lifu Huang
2026-01-13
Summary
This research investigates how large language models, like the ones powering chatbots, learn and forget information about specific things – what we call 'concepts' like 'dog' or 'justice'. It looks at how these concepts are represented *inside* the model and how learning one concept affects learning others.
What's the problem?
We know LLMs are good at processing language, but it's a mystery how they actually build up understanding of the world. Specifically, we don't understand how they pick up new concepts when constantly being trained on new data, which concepts they forget, and how learning one concept impacts their ability to learn others. It's like trying to figure out how a student's brain organizes information while they're in class.
What's the solution?
The researchers examined the 'concept circuits' within the LLM. These are specific patterns of connections inside the model that seem to activate when the model is thinking about a particular concept. They used mathematical measurements of these circuits to track how concepts were learned, forgotten, and how they interacted with each other during ongoing training. By observing changes in these circuits, they could see which concepts were strengthening, weakening, and influencing each other.
Why it matters?
Understanding how LLMs learn and forget is crucial for making them more reliable and useful. This research provides a way to 'look inside' the model and see what's happening when it learns, which could help us design better training methods. Ultimately, this could lead to LLMs that are less prone to errors, more capable of transferring knowledge, and easier to understand.
Abstract
Human beings primarily understand the world through concepts (e.g., dog), abstract mental representations that structure perception, reasoning, and learning. However, how large language models (LLMs) acquire, retain, and forget such concepts during continual pretraining remains poorly understood. In this work, we study how individual concepts are acquired and forgotten, as well as how multiple concepts interact through interference and synergy. We link these behavioral dynamics to LLMs' internal Concept Circuits, computational subgraphs associated with specific concepts, and incorporate Graph Metrics to characterize circuit structure. Our analysis reveals: (1) LLMs concept circuits provide a non-trivial, statistically significant signal of concept learning and forgetting; (2) Concept circuits exhibit a stage-wise temporal pattern during continual pretraining, with an early increase followed by gradual decrease and stabilization; (3) concepts with larger learning gains tend to exhibit greater forgetting under subsequent training; (4) semantically similar concepts induce stronger interference than weakly related ones; (5) conceptual knowledge differs in their transferability, with some significantly facilitating the learning of others. Together, our findings offer a circuit-level view of concept learning dynamics and inform the design of more interpretable and robust concept-aware training strategies for LLMs.