RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

Hongzhi Zang, Shu'ang Yu, Hao Lin, Tianxing Zhou, Zefang Huang, Zhen Guo, Xin Xu, Jiakai Zhou, Yuze Sheng, Shizhe Zhang, Feng Gao, Wenhao Tang, Yufeng Yue, Quanlu Zhang, Xinlei Chen, Chao Yu, Yu Wang

2026-02-10

RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

Summary

This paper introduces USER, a new system designed to make it easier to teach robots to perform tasks directly in the real world, rather than just in computer simulations.

What's the problem?

Teaching robots in the real world is much harder than in simulations because you can't speed up time, easily reset the robot to the starting point, or create many copies of the robot to gather data quickly. This makes collecting enough data for the robot to learn effectively, managing different types of robots, and training them for complex, long-term tasks very difficult. It's not just about the learning algorithms themselves, but also about the practical challenges of working with physical hardware.

What's the solution?

The researchers built USER, which treats robots like powerful computers (GPUs) and manages them all together. It automatically finds and organizes robots, and efficiently handles communication between the robots and the computers doing the learning. USER also allows for learning to continue even if a robot crashes, and it remembers past experiences to improve future learning. It’s designed to work with different learning methods and types of robot control systems, including those using both images and language.

Why it matters?

USER provides a solid foundation for building more intelligent robots that can learn and operate effectively in the real world. It allows researchers to coordinate multiple robots, use advanced models, and train robots for extended periods, ultimately paving the way for robots that can perform complex tasks in everyday environments.

Abstract

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.

View Paper