Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine

2024-08-22

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Summary

This paper discusses a method called CrossFormer, which allows different types of robots to learn and perform tasks using the same set of instructions, improving their ability to work across various environments.

What's the problem?

In robot learning, most systems need large amounts of data to perform well, but each robot type often has only limited data available. This makes it hard for robots to learn effectively, especially when they have different sensors and controls. Training separate models for each robot can be inefficient and doesn’t take advantage of the broader data available.

What's the solution?

The authors introduce CrossFormer, a flexible policy that can learn from data collected from many types of robots. They trained CrossFormer on a huge dataset containing 900,000 examples from 20 different robots. This model can control various robots, like wheeled robots and drones, without needing separate training for each type. It adapts easily to the differences in how each robot operates.

Why it matters?

This research is important because it shows how robots can share learning experiences, making them smarter and more capable without needing excessive data for each individual robot. This could lead to more advanced robotics applications in areas like manufacturing, delivery, and exploration.

Abstract

Modern machine learning systems rely on large datasets to attain broad generalization, and this often poses a challenge in robot learning, where each robotic platform and task might have only a small dataset. By training a single policy across many different kinds of robots, a robot learning method can leverage much broader and more diverse datasets, which in turn can lead to better generalization and robustness. However, training a single policy on multi-robot data is challenging because robots can have widely varying sensors, actuators, and control frequencies. We propose CrossFormer, a scalable and flexible transformer-based policy that can consume data from any embodiment. We train CrossFormer on the largest and most diverse dataset to date, 900K trajectories across 20 different robot embodiments. We demonstrate that the same network weights can control vastly different robots, including single and dual arm manipulation systems, wheeled robots, quadcopters, and quadrupeds. Unlike prior work, our model does not require manual alignment of the observation or action spaces. Extensive experiments in the real world show that our method matches the performance of specialist policies tailored for each embodiment, while also significantly outperforming the prior state of the art in cross-embodiment learning.

View Paper