Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation
Huajie Tan, Sixiang Chen, Yijie Xu, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang
2025-12-30
Summary
This paper tackles the challenge of teaching robots to perform complex tasks using reinforcement learning, specifically focusing on how to give the robot feedback – a 'reward' – for doing things correctly.
What's the problem?
Currently, giving robots good rewards is really hard. Existing methods struggle because they don't fully understand the steps involved in a task and often rely on only seeing things from one viewpoint, making it difficult to judge progress accurately. Also, the way these rewards are designed can unintentionally mislead the robot, causing it to learn the wrong things and get stuck in a 'semantic trap'.
What's the solution?
The researchers developed a system called Dopamine-Reward, which uses a 'General Reward Model' (GRM) trained on a huge amount of robot data. This model understands tasks step-by-step and combines information from multiple viewpoints to give more reliable feedback. They also created a new way to shape the rewards, called 'Policy-Invariant Reward Shaping', which ensures the robot learns the *right* way to do things without being steered off course. This is all combined into a framework called Dopamine-RL.
Why it matters?
This work is important because it makes reinforcement learning much more practical for real-world robots. The system learns quickly – after seeing just one example of a task, the robot can improve its performance dramatically with only a short amount of practice. This means robots can learn new skills more easily and adapt to different situations, opening up possibilities for automation in many areas.
Abstract
The primary obstacle for applying reinforcement learning (RL) to real-world robotics is the design of effective reward functions. While recently learning-based Process Reward Models (PRMs) are a promising direction, they are often hindered by two fundamental limitations: their reward models lack step-aware understanding and rely on single-view perception, leading to unreliable assessments of fine-grained manipulation progress; and their reward shaping procedures are theoretically unsound, often inducing a semantic trap that misguides policy optimization. To address these, we introduce Dopamine-Reward, a novel reward modeling method for learning a general-purpose, step-aware process reward model from multi-view inputs. At its core is our General Reward Model (GRM), trained on a vast 3,400+ hour dataset, which leverages Step-wise Reward Discretization for structural understanding and Multi-Perspective Reward Fusion to overcome perceptual limitations. Building upon Dopamine-Reward, we propose Dopamine-RL, a robust policy learning framework that employs a theoretically-sound Policy-Invariant Reward Shaping method, which enables the agent to leverage dense rewards for efficient self-improvement without altering the optimal policy, thereby fundamentally avoiding the semantic trap. Extensive experiments across diverse simulated and real-world tasks validate our approach. GRM achieves state-of-the-art accuracy in reward assessment, and Dopamine-RL built on GRM significantly improves policy learning efficiency. For instance, after GRM is adapted to a new task in a one-shot manner from a single expert trajectory, the resulting reward model enables Dopamine-RL to improve the policy from near-zero to 95% success with only 150 online rollouts (approximately 1 hour of real robot interaction), while retaining strong generalization across tasks. Project website: https://robo-dopamine.github.io