Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges
Yuqi Tang, Kehua Feng, Yunfeng Wang, Zhiwen Chen, Chengfei Lv, Gang Yu, Qiang Zhang, Keyan Ding
2025-08-04
Summary
This paper talks about a new system that learns to evaluate conversations with multiple turns more efficiently by combining the opinions of several large language models (LLMs) into one single model.
What's the problem?
The problem is that evaluating conversations, especially ones with many back-and-forth exchanges, usually requires running several big models to get good judgments, which takes a lot of computing power and time.
What's the solution?
The paper solves this by training one model to learn from the combined judgments of multiple LLMs, so it can predict the quality of dialogues accurately without needing to run many different models for each evaluation.
Why it matters?
This matters because it makes it faster and cheaper to assess how well AI systems hold conversations, helping improve chatbots and virtual assistants more effectively.
Abstract
An efficient multi-turn dialogue evaluator aggregates multiple LLM judgments into a single model to assess dialogue quality with reduced computational cost.