KV Cache Steering for Inducing Reasoning in Small Language Models
Max Belitsky, Dawid J. Kopiczko, Michael Dorkenwald, M. Jehanzeb Mirza, Cees G. M. Snoek, Yuki M. Asano
2025-07-14
Summary
This paper talks about KV Cache Steering, a new technique that improves how small language models think and reason by making a single change to the model’s stored memory during processing.
What's the problem?
Usually, to guide a language model to reason better, you need to constantly change its internal states during generation, which is complicated, slow, and unstable, especially for smaller models.
What's the solution?
The researchers introduced cache steering, which changes only the key-value cache of the model once before it starts generating text. This one-time change uses special vectors inspired by reasoning examples from larger models to help smaller models perform step-by-step reasoning without needing retraining or complex prompts.
Why it matters?
This matters because it makes it easier and more efficient to improve reasoning in smaller, less powerful AI models, making advanced thinking capabilities more accessible without heavy computing or technical effort.
Abstract
Cache steering improves language model reasoning through a single intervention in the key-value cache, enhancing both structure and performance without fine-tuning.