Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

Emre Can Acikgoz, Jeremiah Greer, Akul Datta, Ze Yang, William Zeng, Oussama Elachqar, Emmanouil Koukoumidis, Dilek Hakkani-Tür, Gokhan Tur

2025-02-18

Can a Single Model Master Both Multi-turn Conversations and Tool Use?
CALM: A Unified Conversational Agentic Language Model

Summary

This paper talks about CALM, a new AI model that can both have natural conversations and use tools effectively, combining two skills that were previously hard to achieve in a single system.

What's the problem?

Current AI systems are either good at having multi-turn conversations (like chatbots) or good at using tools and APIs (like digital assistants), but not both. This limits how helpful they can be in real-world situations where both skills are needed.

What's the solution?

The researchers created CALM, a new type of AI model that learns both conversation skills and tool use at the same time. They made a special training dataset called CALM-IT that teaches the AI how to blend conversation with using tools. They then trained three versions of CALM, with the largest one having 405 billion parameters, and tested them on different benchmarks.

Why it matters?

This matters because it could lead to more versatile and helpful AI assistants that can maintain a natural conversation while also performing complex tasks using tools or APIs. CALM outperformed other top models, including GPT-4o, showing that it's possible to create AI that's good at both talking and doing, which could make digital assistants much more useful in our daily lives.

Abstract

Large Language Models (LLMs) with API-calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA), and our analyses reveal that specialized approaches excel in one domain but underperform in the other. To bridge this chasm, we introduce CALM (Conversational Agentic Language Model), a unified approach that integrates both conversational and agentic capabilities. We created CALM-IT, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage. Using CALM-IT, we train three models CALM 8B, CALM 70B, and CALM 405B, which outperform top domain-specific models, including GPT-4o, across all three benchmarks.

View Paper