GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

Yunfei Li, Xiao Ma, Jiafeng Xu, Yu Cui, Zhongren Cui, Zhigang Han, Liqun Huang, Tao Kong, Yuxiao Liu, Hao Niu, Wanli Peng, Jingchao Qiao, Zeyu Ren, Haixin Shi, Zhi Su, Jiawen Tian, Yuyang Xiao, Shenyu Zhang, Liwei Zheng, Hang Li, Yonghui Wu

2025-12-02

GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

Summary

This paper introduces GR-RL, a new way to teach robots complex tasks like manipulating objects with their hands. It builds on existing 'vision-language-action' systems, which learn from watching and understanding instructions, but improves them to handle tasks needing a lot of precise steps.

What's the problem?

Current robot learning systems often rely on humans demonstrating how to do things. However, when tasks are really difficult and require a lot of dexterity, human demonstrations aren't always perfect – they can be a little messy or not the most efficient way to do it. Simply copying these imperfect demonstrations doesn't lead to robots that can reliably perform these complex actions.

What's the solution?

GR-RL tackles this by using a three-step process. First, it analyzes the demonstrations and filters out the parts that don't actually help move the task forward. It figures out what steps are useful by using a type of learning called reinforcement learning to evaluate each part of the demonstration. Second, it uses a trick called 'augmentation' to help the robot generalize what it learns, meaning it can adapt to slightly different situations. Finally, it fine-tunes the robot's actions online, learning to adjust for small errors and improve its precision. This is done by predicting and correcting for noise in the robot's movements.

Why it matters?

This work is significant because it's the first time a robot has been able to autonomously lace up a shoe – a task that requires planning many steps ahead, extremely accurate movements, and dealing with flexible materials like laces. It shows that it's possible to create robots that can learn complex skills and become experts in the real world, starting from a more general understanding of how things work.

Abstract

We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting Q-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.

View Paper