Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Donghang Wu, Haoyang Zhang, Jun Chen, Xiangyu, Zhang, Hexin Liu, Eng Siong Chng, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu

2025-10-13

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

Summary

This paper introduces a new way to make spoken language models, which are AI systems that can talk and think, reason in real-time, just like people do. It tackles the issue of these models being slow when trying to think through a problem before speaking.

What's the problem?

Current spoken language models struggle with 'Chain-of-Thought' reasoning – where they need to think step-by-step to solve a problem – because generating each step of the thought process takes time. This delay makes them unable to have natural, real-time conversations. They either rush their reasoning and make mistakes, or they take too long to respond, making the interaction feel unnatural.

What's the solution?

The researchers created a system called 'Mind-Paced Speaking' (MPS). It's inspired by how the human brain works, separating thinking and speaking into two different parts. One part, the 'Formulation Brain,' focuses on high-level reasoning and plans what to say. The other part, the 'Articulation Brain,' then turns those thoughts into fluent speech. This split allows the model to think and speak simultaneously, avoiding pauses and maintaining the quality of reasoning.

Why it matters?

This work is important because it significantly improves the ability of AI to have intelligent, real-time conversations. It allows these models to reason effectively *while* they are speaking, bridging the gap between accurate thinking and quick responses. The results show MPS performs as well as models that take their time to think, but without the long delays, making it a big step towards more natural and helpful AI assistants.

Abstract

Real-time Spoken Language Models (SLMs) struggle to leverage Chain-of-Thought (CoT) reasoning due to the prohibitive latency of generating the entire thought process sequentially. Enabling SLMs to think while speaking, similar to humans, is attracting increasing attention. We present, for the first time, Mind-Paced Speaking (MPS), a brain-inspired framework that enables high-fidelity, real-time reasoning. Similar to how humans utilize distinct brain regions for thinking and responding, we propose a novel dual-brain approach, employing a "Formulation Brain" for high-level reasoning to pace and guide a separate "Articulation Brain" for fluent speech generation. This division of labor eliminates mode-switching, preserving the integrity of the reasoning process. Experiments show that MPS significantly outperforms existing think-while-speaking methods and achieves reasoning performance comparable to models that pre-compute the full CoT before speaking, while drastically reducing latency. Under a zero-latency configuration, the proposed method achieves an accuracy of 92.8% on the mathematical reasoning task Spoken-MQA and attains a score of 82.5 on the speech conversation task URO-Bench. Our work effectively bridges the gap between high-quality reasoning and real-time interaction.

View Paper