Generalization or Memorization: Dynamic Decoding for Mode Steering
Xuanming Zhang
2025-10-29
Summary
This paper investigates why large language models (LLMs) sometimes perform really well and sometimes just repeat things they’ve memorized from their training data, and proposes a way to make them more consistently reliable.
What's the problem?
LLMs are unpredictable. They can be great at generalizing and applying knowledge to new situations, but they also frequently just spit back information they’ve seen before without actually *understanding* it. This makes them untrustworthy when you need them to be accurate, like in important decision-making processes, because you can’t always tell if they’re reasoning or just recalling.
What's the solution?
The researchers developed a system called Dynamic Mode Steering (DMS). It works by first figuring out *how* the LLM is currently thinking – whether it’s generalizing or memorizing. It does this using a quick check of the model’s internal workings. Then, DMS subtly adjusts the model’s processing to encourage it to rely more on generalization and less on memorization. Think of it like gently guiding the model towards thinking more creatively instead of just repeating facts.
Why it matters?
This work is important because it offers a way to build more trustworthy LLMs. By understanding and controlling when a model is generalizing versus memorizing, we can improve their accuracy and consistency, making them more useful and dependable for real-world applications where reliability is crucial.
Abstract
Large Language Models (LLMs) exhibit a troubling duality, capable of both remarkable generalization and brittle, verbatim memorization of their training data. This unpredictability undermines their reliability in high-stakes applications. In this work, we propose a unified framework to understand, identify, and control these distinct reasoning modes. First, we introduce a theoretical model based on the Information Bottleneck (IB) principle, formalizing generalization as the learning of a compressed, task-relevant representation and memorization as a failure to compress. Building on this theory, we develop Dynamic Mode Steering (DMS), a novel inference-time algorithm which comprises two components: (1) a lightweight, causally-grounded linear probe that identifies the model's instantaneous reliance on memorization, and (2) a dynamic activation steering mechanism that nudges the model's computation towards pre-identified generalization circuits. We frame DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning and faithfulness tasks demonstrate that DMS significantly improves logical consistency and factual accuracy, thereby offering a principled approach to enhancing LLM reliability.