Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

2024-06-25

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Summary

This paper introduces duplex models, a new approach that allows large language models (LLMs) to have real-time conversations by listening and responding to users simultaneously, rather than waiting for users to finish speaking before generating replies.

What's the problem?

Traditional chat systems using LLMs are turn-based, meaning users have to wait until the model finishes generating its response before they can interact again. This can make conversations feel unnatural and slow, as it doesn't mimic how humans communicate in real life, where interruptions and back-and-forth exchanges happen naturally.

What's the solution?

To solve this problem, the authors developed duplex models that can process user inputs while generating responses. They achieved this by dividing conversation inputs and outputs into smaller segments called time slices, which allows the model to handle parts of the conversation at the same time. They also created a special dataset for training these models that includes examples of real-time interactions, helping the models learn how to respond quickly and appropriately to user inputs.

Why it matters?

This research is significant because it enhances the way AI systems interact with people, making conversations feel more fluid and human-like. By allowing for real-time feedback and interruptions, duplex models could greatly improve user satisfaction in applications like virtual assistants and customer support chatbots, making them more effective and engaging.

Abstract

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to duplex models so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

View Paper