OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, Guorui Zhou

2025-03-04

OneRec: Unifying Retrieve and Rank with Generative Recommender and
Iterative Preference Alignment

Summary

This paper talks about OneRec, a new AI system for making better recommendations, like suggesting videos or products, by using a single, unified model instead of the usual multi-step process.

What's the problem?

Most recommendation systems use a complicated two-step process called 'retrieve-and-rank,' where they first pick some options and then rank them. This method can be slow, complex, and less accurate. It also struggles to fully understand user preferences and provide personalized suggestions.

What's the solution?

The researchers developed OneRec, which combines everything into one step using an advanced generative AI model. It uses an encoder-decoder structure to analyze a user's past behavior and predict what they might like next. To make it even better, they introduced session-wise generation, which looks at the context of a user's entire session instead of just predicting one item at a time. They also added a reward system to improve how well the model aligns with user preferences.

Why it matters?

This matters because OneRec simplifies the recommendation process while improving accuracy and personalization. It performed better than traditional systems in real-world tests, increasing watch time on a popular video platform by 1.6%. This could lead to more engaging and efficient recommendation systems for apps and websites that millions of people use every day.

Abstract

Recently, generative retrieval-based recommendation systems have emerged as a promising paradigm. However, most modern recommender systems adopt a retrieve-and-rank strategy, where the generative model functions only as a selector during the retrieval stage. In this paper, we propose OneRec, which replaces the cascaded learning framework with a unified generative model. To the best of our knowledge, this is the first end-to-end generative model that significantly surpasses current complex and well-designed recommender systems in real-world scenarios. Specifically, OneRec includes: 1) an encoder-decoder structure, which encodes the user's historical behavior sequences and gradually decodes the videos that the user may be interested in. We adopt sparse Mixture-of-Experts (MoE) to scale model capacity without proportionally increasing computational FLOPs. 2) a session-wise generation approach. In contrast to traditional next-item prediction, we propose a session-wise generation, which is more elegant and contextually coherent than point-by-point generation that relies on hand-crafted rules to properly combine the generated results. 3) an Iterative Preference Alignment module combined with Direct Preference Optimization (DPO) to enhance the quality of the generated results. Unlike DPO in NLP, a recommendation system typically has only one opportunity to display results for each user's browsing request, making it impossible to obtain positive and negative samples simultaneously. To address this limitation, We design a reward model to simulate user generation and customize the sampling strategy. Extensive experiments have demonstrated that a limited number of DPO samples can align user interest preferences and significantly improve the quality of generated results. We deployed OneRec in the main scene of Kuaishou, achieving a 1.6\% increase in watch-time, which is a substantial improvement.

View Paper