User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale
Jungho Cho, Minbyul Jeong, Sungrae Park
2026-01-14
Summary
This paper focuses on creating better training data for AI agents that can use tools to help people, aiming for more realistic and helpful conversations.
What's the problem?
Currently, creating data to train these AI agents is difficult because the tools they can use are usually fixed and limited. When researchers *do* create data, the AI often just focuses on quickly finishing the task without much back-and-forth, unlike how a human would actually interact with an assistant. Existing methods don't capture the natural, extended conversations that happen when people work with tools over multiple turns.
What's the solution?
The researchers developed a system where an AI simulates both the user *and* the agent. Instead of just giving the agent a task, they created an AI 'user' that asks for things step-by-step and provides feedback, just like a real person would. This forces the agent to have a more extended conversation and use tools in a more realistic way. The system is also designed to easily create lots of this data, and even have the agent handle multiple tasks within a single conversation.
Why it matters?
This work is important because it allows for the creation of much better training data for AI agents. By simulating realistic user behavior, the agents can learn to be more helpful, interactive, and adaptable, ultimately leading to AI assistants that are more useful and feel more natural to interact with.
Abstract
The recent paradigm shift toward large reasoning models (LRMs) as autonomous agents has intensified the demand for sophisticated, multi-turn tool-use capabilities. Yet, existing datasets and data-generation approaches are limited by static, predefined toolsets that cannot scale to the complexity of open-ended human-agent collaboration. To address this, we initially developed a framework for automated task-oriented multi-turn dialogue generation at scale, utilizing an LRM-based simulator to dynamically generate high-value, domain-specific tools to solve specified tasks. However, we observe that a purely task-oriented design often results in "solely task-solving" trajectories, where the agent completes the objective with minimal interaction, failing to generate the high turn-count conversations seen in realistic scenarios. To bridge this gap, we shift toward a user-oriented simulation paradigm. By decoupling task generation from a dedicated user simulator that mimics human behavioral rules - such as incremental request-making and turn-by-turn feedback - we facilitate more authentic, extended multi-turn dialogues that reflect the iterative nature of real-world problem solving. Our generation pipeline operates as a versatile, plug-and-play module capable of initiating generation from any state, ensuring high scalability in producing extended tool-use data. Furthermore, by facilitating multiple task completions within a single trajectory, it yields a high-density dataset that reflects the multifaceted demands of real-world human-agent interaction.