RecoWorld: Building Simulated Environments for Agentic Recommender Systems
Fei Liu, Xinyu Lin, Hanchao Yu, Mingyuan Wu, Jianyu Wang, Qiang Zhang, Zhuokai Zhao, Yinglong Xia, Yao Zhang, Weiwei Li, Mingze Gao, Qifan Wang, Lizhu Zhang, Benyu Zhang, Xiangjun Fan
2025-09-19
Summary
This paper introduces RecoWorld, a new way to build realistic practice environments for recommender systems that act like agents. It's like a flight simulator, but instead of training pilots, it trains computer programs to suggest things people will like.
What's the problem?
Currently, training these 'agentic' recommender systems is difficult because you can't just let them learn by making mistakes on real users. Bad recommendations can annoy people and ruin the experience. There's a need for a safe space where these systems can experiment and improve without negatively impacting anyone.
What's the solution?
RecoWorld creates that safe space. It has two main parts: a 'simulated user' that acts like a real person browsing for things, and the 'agentic recommender' that tries to suggest items to that user. They have conversations over multiple turns, and the simulated user gives feedback – not just 'like' or 'dislike', but also explains *why* they feel a certain way. The recommender uses this feedback to get better at understanding what the user wants, using the powerful reasoning abilities of large language models. It can also simulate different groups of users to test how well it works for everyone.
Why it matters?
This work is important because it's a first step towards recommender systems that work *with* users, instead of just *at* them. Imagine a system where you can tell it what you're looking for, and it responds intelligently, learning from your instructions to provide even better suggestions. RecoWorld helps make that kind of collaborative, personalized experience possible, ultimately leading to more engaging and satisfying interactions.
Abstract
We present RecoWorld, a blueprint for building simulated environments tailored to agentic recommender systems. Such environments give agents a proper training space where they can learn from errors without impacting real users. RecoWorld distinguishes itself with a dual-view architecture: a simulated user and an agentic recommender engage in multi-turn interactions aimed at maximizing user retention. The user simulator reviews recommended items, updates its mindset, and when sensing potential user disengagement, generates reflective instructions. The agentic recommender adapts its recommendations by incorporating these user instructions and reasoning traces, creating a dynamic feedback loop that actively engages users. This process leverages the exceptional reasoning capabilities of modern LLMs. We explore diverse content representations within the simulator, including text-based, multimodal, and semantic ID modeling, and discuss how multi-turn RL enables the recommender to refine its strategies through iterative interactions. RecoWorld also supports multi-agent simulations, allowing creators to simulate the responses of targeted user populations. It marks an important first step toward recommender systems where users and agents collaboratively shape personalized information streams. We envision new interaction paradigms where "user instructs, recommender responds," jointly optimizing user retention and engagement.