SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs
Dachuan Shi, Abedelkadir Asi, Keying Li, Xiangchi Yuan, Leyan Pan, Wenke Lee, Wen Xiao
2025-10-07
Summary
This paper explores how large language models (LLMs) can 'think' to solve problems, going beyond just writing out step-by-step solutions. It focuses on a method called 'latent reasoning' where the model processes information internally, and addresses issues that arise when models get stuck or spend too much time thinking without improving the answer.
What's the problem?
LLMs are getting better at reasoning, but when they think internally (latent reasoning) instead of writing out steps, they can struggle. This happens because the model explores many possible solutions at once, which can lead to uncertainty and incorrect answers. Also, these models sometimes 'overthink' – they keep processing information even when it’s not helping them get closer to the right solution, wasting resources and time.
What's the solution?
The researchers developed a framework called SwiReasoning that helps LLMs balance internal thinking with explicitly writing out steps. It does this by monitoring how confident the model is in its predictions and switching between internal reasoning and writing out steps based on that confidence. It also limits how many times the model can switch between these modes to prevent overthinking and improve efficiency.
Why it matters?
This work is important because it makes LLMs more accurate and efficient at solving complex problems. By improving how models reason, especially when they're working internally, we can get better results with less computational cost. This is particularly useful when resources are limited, like when running models on devices with less processing power or when trying to solve problems quickly.
Abstract
Recent work shows that, beyond discrete reasoning through explicit chain-of-thought steps, which are limited by the boundaries of natural languages, large language models (LLMs) can also reason continuously in latent space, allowing richer information per step and thereby improving token efficiency. Despite this promise, latent reasoning still faces two challenges, especially in training-free settings: 1) purely latent reasoning broadens the search distribution by maintaining multiple implicit paths, which diffuses probability mass, introduces noise, and impedes convergence to a single high-confidence solution, thereby hurting accuracy; and 2) overthinking persists even without explicit text, wasting tokens and degrading efficiency. To address these issues, we introduce SwiReasoning, a training-free framework for LLM reasoning which features two key innovations: 1) SwiReasoning dynamically switches between explicit and latent reasoning, guided by block-wise confidence estimated from entropy trends in next-token distributions, to balance exploration and exploitation and promote timely convergence. 2) By limiting the maximum number of thinking-block switches, SwiReasoning curbs overthinking and improves token efficiency across varying problem difficulties. On widely used mathematics and STEM benchmarks, SwiReasoning consistently improves average accuracy by 1.5%-2.8% across reasoning LLMs of different model families and scales. Furthermore, under constrained budgets, SwiReasoning improves average token efficiency by 56%-79%, with larger gains as budgets tighten.