< Explain other AI papers

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Liran Ringel, Elad Tolochinsky, Yaniv Romano

2025-06-16

Learning a Continue-Thinking Token for Enhanced Test-Time Scaling

Summary

This paper talks about a new idea to help large language models (LLMs) think better by using a special kind of token called a continuous thinking token. Unlike normal tokens that represent words or symbols one at a time, this token lets the model keep reasoning in a continuous, ongoing way, like thinking quietly to itself before answering. They use reinforcement learning to teach the model how to use this continuous thinking token effectively during testing or inference.

What's the problem?

The problem is that normally, during inference, LLMs only work with fixed tokens that represent discrete steps or words. This limits how well the model can improve its reasoning or thinking process when trying to solve hard problems or understand complex information. Using only fixed tokens means the model can’t take full advantage of a more flexible thinking process that might help it come up with better answers.

What's the solution?

The solution was to create a continuous thinking token that the model learns to use through reinforcement learning, allowing it to keep reasoning in a flexible, ongoing way during inference. This token represents a continuous hidden state rather than a single word, so the model can keep refining its thought process dynamically. By doing this, the model can improve its accuracy more than if it were restricted to using a predefined fixed token during its reasoning steps.

Why it matters?

This matters because it helps AI models think more deeply and flexibly, leading to better and more accurate answers without needing extra training every time they are tested. It enables smarter reasoning processes that are closer to how humans think, making AI systems more powerful and reliable for tough tasks like problem-solving, reasoning, and understanding complicated information.

Abstract

A continuous thinking token learned via reinforcement learning improves language model accuracy more effectively than a fixed token during inference.