RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Charles Xu, Qiyang Li, Jianlan Luo, Sergey Levine
2024-12-17

Summary
This paper talks about RLDG, a new method that uses reinforcement learning to improve how robots learn to perform various tasks by generating high-quality training data.
What's the problem?
While robots have become more flexible and can adapt to different tasks, their performance often depends on the quality of the data they are trained on. Traditional methods rely on human demonstrations, which can be inconsistent or limited. This makes it hard for robots to learn effectively and generalize their skills to new situations.
What's the solution?
RLDG addresses this issue by using reinforcement learning to create better training data for robots. Instead of just relying on human examples, RLDG trains specialized policies for specific tasks and then uses these policies to generate high-quality training data. This method allows robots to learn from optimized actions and better understand their environment, leading to improved performance in various tasks like connector insertion and assembly. The results show that robots trained with this method perform up to 40% better than those trained solely on human demonstrations.
Why it matters?
This research is important because it enhances the capabilities of robotic systems, making them more efficient and effective at completing a wide range of tasks. By improving how robots learn, RLDG could lead to advancements in automation and robotics in industries such as manufacturing, healthcare, and service, ultimately making robots more useful in everyday applications.
Abstract
Recent advances in robotic foundation models have enabled the development of generalist policies that can adapt to diverse tasks. While these models show impressive flexibility, their performance heavily depends on the quality of their training data. In this work, we propose Reinforcement Learning Distilled Generalists (RLDG), a method that leverages reinforcement learning to generate high-quality training data for finetuning generalist policies. Through extensive real-world experiments on precise manipulation tasks like connector insertion and assembly, we demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations, achieving up to 40% higher success rates while generalizing better to new tasks. We also provide a detailed analysis that reveals this performance gain stems from both optimized action distributions and improved state coverage. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems that maintain the flexibility of foundation models while achieving the performance of specialized controllers. Videos and code can be found on our project website https://generalist-distillation.github.io