UserBench: An Interactive Gym Environment for User-Centric Agents

Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang

2025-08-12

UserBench: An Interactive Gym Environment for User-Centric Agents

Summary

This paper talks about UserBench, a new testing environment that checks how well AI agents based on large language models can interact with users over multiple turns, especially when users start with vague or unclear goals. It focuses on whether these AI agents can understand users, ask questions to clarify preferences, and use tools to complete tasks effectively.

What's the problem?

The problem is that while AI agents have improved at using tools and solving tasks, they often fail to truly understand what users want, especially when user goals are unclear, change over time, or are indirectly expressed. This means agents may complete tasks but still not satisfy the user's real needs because they can't adapt or collaborate well.

What's the solution?

The paper introduces UserBench as a gym-like environment where simulated users interact with AI agents, starting with unclear instructions and revealing preferences gradually. The agents must proactively ask questions, interpret subtle hints, and use tools wisely to align with evolving user goals. UserBench tests and reveals that current AI models struggle to fully capture user intent and preferences despite their tool-use abilities.

Why it matters?

This matters because for AI assistants to be truly helpful and user-friendly, they need to collaborate with people by understanding and adapting to their changing and sometimes unclear needs. UserBench provides a way to measure and improve this important skill, pushing AI toward becoming better interactive partners that work closely with users in real-world tasks.

Abstract

UserBench evaluates LLM-based agents in multi-turn interactions with simulated users, revealing gaps in task completion and user alignment.

View Paper