DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

Jiho Kim, Woosog Chay, Hyeonji Hwang, Daeun Kyung, Hyunseung Chung, Eunbyeol Cho, Yohan Jo, Edward Choi

2024-06-26

DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents

Summary

This paper introduces DialSim, a new tool designed to test how well conversational agents (like chatbots) understand and engage in long conversations. It simulates real-time dialogues by using characters from popular TV shows.

What's the problem?

As conversational agents have become more advanced, it's important to evaluate their performance in realistic scenarios. However, many existing evaluation methods do not consider the complexities of real-life conversations, such as how agents handle ongoing discussions, respond to multiple speakers, and remember past interactions. This gap makes it hard to truly assess how well these agents can perform in actual conversations.

What's the solution?

DialSim addresses this issue by creating a real-time dialogue simulator where an agent takes on the role of a character from a TV show. The agent must respond to spontaneous questions while keeping track of previous dialogue and distinguishing between what it knows and what it doesn't. DialSim evaluates the agent's ability to respond quickly, manage long conversations with multiple participants, and adapt to challenges like changing character names. This setup allows for a more thorough evaluation of the agent's dialogue understanding skills.

Why it matters?

This research is important because it helps improve conversational AI by providing a better way to test how these systems perform in complex, real-world situations. By using DialSim, developers can identify strengths and weaknesses in their AI agents, leading to enhancements that make these systems more effective in engaging with humans naturally and meaningfully.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator.

View Paper