Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency
Chenlong Wang, Yuanning Feng, Dongping Chen, Zhaoyang Chu, Ranjay Krishna, Tianyi Zhou
2025-06-17
Summary
This paper talks about NoWait, a new method that helps AI models reason faster by removing special 'thinking tokens' that make the model pause and reflect step-by-step during inference. Instead of making the model explicitly think out loud, NoWait lets it work more efficiently while still performing well in multimodal reasoning tasks involving images and text.
What's the problem?
The problem is that many AI models use special tokens during reasoning that act like pauses for the model to think carefully step-by-step, but these tokens make the process slower and use more computing resources. Although these thinking tokens can improve reasoning, they also reduce the model’s speed and efficiency, especially when dealing with multiple types of data like images and text.
What's the solution?
The solution was to create NoWait, which removes these explicit self-reflection tokens during inference so the model doesn’t have to wait or slow down to think out loud. The method trains the model to reason well without needing these tokens, maintaining strong reasoning performance in multimodal tasks but running faster and more efficiently, which means the model uses less time and computing power.
Why it matters?
This matters because making AI models reason faster and more efficiently while keeping good performance helps them be more practical for real-world uses like understanding images with text or answering complex questions. By removing unnecessary delays in the reasoning process, NoWait makes AI systems quicker and less costly to run, improving their usefulness and accessibility.
Abstract
NoWait suppresses explicit self-reflection tokens during inference to enhance efficiency in multimodal reasoning without reducing model utility.