RISE: Self-Improving Robot Policy with Compositional World Model
Jiazhi Yang, Kunyang Lin, Jinwei Li, Wencong Zhang, Tianwei Lin, Longyan Wu, Zhizhong Su, Hao Zhao, Ya-Qin Zhang, Li Chen, Ping Luo, Xiangyu Yue, Hongyang Li
2026-02-13
Summary
This paper introduces RISE, a new way to teach robots complex tasks like manipulating objects, using a technique called reinforcement learning but doing most of the learning in a simulated 'imagination' rather than directly in the real world.
What's the problem?
Robots struggle with tasks that require precise movements and adapting to changes, like stacking blocks or packing a backpack. Traditional reinforcement learning, where a robot learns by trial and error, is too risky and expensive to use directly in these situations because of potential damage to the robot or environment, and the time it takes to reset for another try.
What's the solution?
RISE solves this by creating a 'world model' inside the robot's system. This model lets the robot *imagine* what will happen if it tries different actions. It predicts future states and also estimates how good those outcomes will be. The robot then uses these imagined experiences to improve its control policy, essentially learning from its 'imagination' instead of constant real-world attempts. The system is designed so that the parts that predict the future and evaluate outcomes can be optimized separately for best performance.
Why it matters?
This research is important because it allows robots to learn complex manipulation skills more safely, quickly, and cheaply. By minimizing the need for physical interaction, RISE opens the door to robots being able to handle more delicate and dynamic tasks in real-world environments, showing significant improvements in tasks like brick sorting, backpack packing, and box closing.
Abstract
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.