Effectively Controlling Reasoning Models through Thinking Intervention

Tong Wu, Chong Xiang, Jiachen T. Wang, Prateek Mittal

2025-04-01

Effectively Controlling Reasoning Models through Thinking Intervention

Summary

This paper explores how to better control AI language models that use reasoning steps to solve problems.

What's the problem?

It's hard to precisely control the reasoning process of AI language models, which can lead to errors or unsafe outputs.

What's the solution?

The researchers developed a technique called 'Thinking Intervention,' which allows them to guide the model's reasoning by inserting or changing specific words in its thought process.

Why it matters?

This work matters because it can improve the accuracy, safety, and reliability of AI language models in various tasks.

Abstract

Reasoning-enhanced large language models (LLMs) explicitly generate intermediate reasoning steps prior to generating final answers, helping the model excel in complex problem-solving. In this paper, we demonstrate that this emerging generation framework offers a unique opportunity for more fine-grained control over model behavior. We propose Thinking Intervention, a novel paradigm designed to explicitly guide the internal reasoning processes of LLMs by strategically inserting or revising specific thinking tokens. We conduct comprehensive evaluations across multiple tasks, including instruction following on IFEval, instruction hierarchy on SEP, and safety alignment on XSTest and SORRY-Bench. Our results demonstrate that Thinking Intervention significantly outperforms baseline prompting approaches, achieving up to 6.7% accuracy gains in instruction-following scenarios, 15.4% improvements in reasoning about instruction hierarchies, and a 40.0% increase in refusal rates for unsafe prompts using open-source DeepSeek R1 models. Overall, our work opens a promising new research avenue for controlling reasoning LLMs.

View Paper