MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

Lingxiang Hu, Shurun Yuan, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

2025-02-10

MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

Summary

This paper talks about testing if AI language models can attend meetings for us, acting as our representatives to save time and make meetings more efficient.

What's the problem?

Meetings are important for work but they take up a lot of time, can be hard to schedule, and sometimes aren't very productive. People are wondering if advanced AI could help by attending meetings in our place.

What's the solution?

The researchers created a system where AI models act as meeting delegates. They tested different AI models on real meeting transcripts to see how well they could participate. Some AIs were more careful in their responses, while others were more active. About 60% of the time, the AIs could address at least one important point from the meeting.

Why it matters?

This matters because it could change how we handle meetings in the future. If AI can effectively represent us in some meetings, it could save time and make work more efficient. However, the study also shows there's still room for improvement, like making sure the AI doesn't repeat information or say irrelevant things. This research helps us understand both the potential and the challenges of using AI in workplace communication.

Abstract

In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.

View Paper