STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Cheng-Han Chiang, Xiaofei Wang, Linjie Li, Chung-Ching Lin, Kevin Lin, Shujie Liu, Zhendong Wang, Zhengyuan Yang, Hung-yi Lee, Lijuan Wang
2025-07-22
Summary
This paper talks about STITCH, a new method for spoken language models that lets the AI think and talk at the same time by switching between silent reasoning and speaking parts out loud.
What's the problem?
The problem is that usually when AI models answer questions or solve problems, they either think first and then speak or do both in a way that causes delays, making responses slower and less natural.
What's the solution?
The authors created STITCH, which breaks down reasoning and speaking into small chunks, allowing the model to think quietly in between spoken parts. This makes the process faster with less waiting time and helps the AI perform better on challenging tasks like math reasoning.
Why it matters?
This matters because it makes AI conversations more natural and efficient, allowing voice assistants and other spoken AI systems to respond quickly while still providing smart and accurate answers.
Abstract
Stitch is a novel method that enables simultaneous thinking and talking in spoken language models by alternating between generating unspoken reasoning and spoken response chunks, achieving low latency and improved performance on math reasoning tasks.