ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
Qianyu He, Siyu Yuan, Xuefeng Li, Mingxuan Wang, Jiangjie Chen
2025-08-27
Summary
This paper introduces ThinkDial, a new open-source system for large language models that allows users to control how much 'thinking' the model does when solving problems, balancing speed and accuracy.
What's the problem?
Large language models are really good at complex tasks when they're allowed to think through them step-by-step, but this process can take a lot of computing power and time. While some companies have created ways to control this 'thinking' process with different modes, these tools aren't available to everyone using open-source models.
What's the solution?
The researchers developed ThinkDial, which lets you switch between three modes: High (full thinking), Medium (faster with a small accuracy drop), and Low (even faster with a slightly bigger accuracy drop). They trained the model in a special way, first teaching it to respond in different modes and then using a reward system to fine-tune its performance in each mode, ensuring it stays accurate even when thinking less.
Why it matters?
ThinkDial is important because it brings the ability to control a language model's reasoning process to the open-source community. This means researchers and developers can now build faster and more efficient applications using these powerful models without sacrificing too much accuracy, making them more practical for real-world use.
Abstract
Large language models (LLMs) with chain-of-thought reasoning have demonstrated remarkable problem-solving capabilities, but controlling their computational effort remains a significant challenge for practical deployment. Recent proprietary systems like OpenAI's gpt-oss series have introduced discrete operational modes for intuitive reasoning control, but the open-source community has largely failed to achieve such capabilities. In this paper, we introduce ThinkDial, the first open-recipe end-to-end framework that successfully implements gpt-oss-style controllable reasoning through discrete operational modes. Our system enables seamless switching between three distinct reasoning regimes: High mode (full reasoning capability), Medium mode (50 percent token reduction with <10 percent performance degradation), and Low mode (75 percent token reduction with <15 percent performance degradation). We achieve this through an end-to-end training paradigm that integrates budget-mode control throughout the entire pipeline: budget-mode supervised fine-tuning that embeds controllable reasoning capabilities directly into the learning process, and two-phase budget-aware reinforcement learning with adaptive reward shaping. Extensive experiments demonstrate that ThinkDial achieves target compression-performance trade-offs with clear response length reductions while maintaining performance thresholds. The framework also exhibits strong generalization capabilities on out-of-distribution tasks.