TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling

Seungheon Doh, Keunwoo Choi, Juhan Nam

2025-10-06

TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling

Summary

This paper introduces a new way to build music recommendation systems using large language models, but instead of relying *only* on the language model's abilities, it combines them with more traditional methods like filtering and searching.

What's the problem?

Current music recommendation systems using large language models are pretty good at understanding what you *say* you want, but they don't always effectively use other important information like song genre, artist, or release year. They often ignore simpler, but useful, ways to narrow down choices, like directly filtering for specific attributes.

What's the solution?

The researchers created a system where the large language model acts like a manager. When you ask for a song, the language model figures out *which* tools to use to find the best matches – things like searching a database with keywords, looking for songs similar to others you like, or directly filtering based on criteria you specify. It then decides the order to use these tools and what information to give them, essentially planning the whole search process.

Why it matters?

This approach is important because it allows recommendation systems to be more flexible and accurate. By combining the power of language understanding with the efficiency of traditional methods, the system can handle a wider range of requests and provide better recommendations, paving the way for more natural and helpful conversational music experiences.

Abstract

While the recent developments in large language models (LLMs) have successfully enabled generative recommenders with natural language interactions, their recommendation behavior is limited, leaving other simpler yet crucial components such as metadata or attribute filtering underutilized in the system. We propose an LLM-based music recommendation system with tool calling to serve as a unified retrieval-reranking pipeline. Our system positions an LLM as an end-to-end recommendation system that interprets user intent, plans tool invocations, and orchestrates specialized components: boolean filters (SQL), sparse retrieval (BM25), dense retrieval (embedding similarity), and generative retrieval (semantic IDs). Through tool planning, the system predicts which types of tools to use, their execution order, and the arguments needed to find music matching user preferences, supporting diverse modalities while seamlessly integrating multiple database filtering methods. We demonstrate that this unified tool-calling framework achieves competitive performance across diverse recommendation scenarios by selectively employing appropriate retrieval methods based on user queries, envisioning a new paradigm for conversational music recommendation systems.

View Paper