RecGPT-V2 Technical Report
Chao Yi, Dian Chen, Gaoyang Guo, Jiakai Tang, Jian Wu, Jing Yu, Mao Zhang, Wen Chen, Wenjun Yang, Yujie Luo, Yuning Jiang, Zhujin Gao, Bo Zheng, Binbin Cao, Changfa Wu, Dixuan Wang, Han Wu, Haoyi Hu, Kewei Zhu, Lang Tian, Lin Yang, Qiqi Huang
2025-12-17
Summary
This paper introduces RecGPT-V2, an improved version of a system that uses large language models (LLMs) to make better recommendations. It moves beyond simply noticing what you've clicked on to actually trying to understand *why* you might want something, like a more thoughtful friend suggesting a movie.
What's the problem?
The first version, RecGPT-V1, was a good start, but it had some issues. It took a lot of computing power and often repeated itself when trying to figure out your interests. The explanations it gave for recommendations weren't very varied, and it didn't adapt well to new situations. Finally, the way it was tested didn't really reflect what people actually care about when choosing something.
What's the solution?
RecGPT-V2 tackles these problems in four main ways. First, it uses a team of 'agents' working together to reason about your interests, which is more efficient and covers more possibilities. Second, it creates more diverse explanations by dynamically adjusting how it asks questions to the LLM. Third, it uses a special type of learning called 'constrained reinforcement learning' to help the system make better predictions and give explanations people agree with. Finally, it evaluates the system in a more complex way, mimicking how a person would judge a recommendation.
Why it matters?
This research shows that using LLMs for recommendations isn't just a cool idea, but something that can actually work well in the real world. Tests on the Taobao platform showed significant improvements in things like click-through rates and how much time people spent browsing, proving that this technology can be commercially successful and provide a better experience for users.
Abstract
Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cognitive redundancy across multiple reasoning routes; (2) insufficient explanation diversity in fixed-template generation; (3) limited generalization under supervised learning paradigms; and (4) simplistic outcome-focused evaluation that fails to match human standards. To address these challenges, we present RecGPT-V2 with four key innovations. First, a Hierarchical Multi-Agent System restructures intent reasoning through coordinated collaboration, eliminating cognitive duplication while enabling diverse intent coverage. Combined with Hybrid Representation Inference that compresses user-behavior contexts, our framework reduces GPU consumption by 60% and improves exclusive recall from 9.39% to 10.99%. Second, a Meta-Prompting framework dynamically generates contextually adaptive prompts, improving explanation diversity by +7.3%. Third, constrained reinforcement learning mitigates multi-reward conflicts, achieving +24.1% improvement in tag prediction and +13.0% in explanation acceptance. Fourth, an Agent-as-a-Judge framework decomposes assessment into multi-step reasoning, improving human preference alignment. Online A/B tests on Taobao demonstrate significant improvements: +2.98% CTR, +3.71% IPV, +2.19% TV, and +11.46% NER. RecGPT-V2 establishes both the technical feasibility and commercial viability of deploying LLM-powered intent reasoning at scale, bridging the gap between cognitive exploration and industrial utility.