Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies
Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu
2024-10-15

Summary
This paper discusses Improved 3D Diffusion Policies (iDP3), a new method that allows humanoid robots to perform tasks in various environments using only data collected in a lab.
What's the problem?
Humanoid robots have struggled to operate effectively in different settings because they often lack the skills needed to adapt to new environments. Previous methods for teaching these robots how to manipulate objects relied heavily on specific camera setups and detailed scene information, which made it hard to deploy them in real-world situations.
What's the solution?
The authors introduce iDP3, a new approach that simplifies how humanoid robots learn to perform tasks. Instead of needing complex camera setups, iDP3 uses egocentric 3D visual representations, which means the robot can understand its surroundings from its own point of view. This allows it to learn and perform tasks in diverse real-world scenarios without needing extensive recalibration or additional data collection beyond what was gathered in the lab.
Why it matters?
This research is important because it enhances the capabilities of humanoid robots, making them more versatile and effective in everyday tasks. By improving how these robots learn and adapt, iDP3 can lead to advancements in robotics that could benefit areas like home assistance, manufacturing, and service industries.
Abstract
Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills. Recent advances in 3D visuomotor policies, such as the 3D Diffusion Policy (DP3), have shown promise in extending these capabilities to wilder environments. However, 3D visuomotor policies often rely on camera calibration and point-cloud segmentation, which present challenges for deployment on mobile robots like humanoids. In this work, we introduce the Improved 3D Diffusion Policy (iDP3), a novel 3D visuomotor policy that eliminates these constraints by leveraging egocentric 3D visual representations. We demonstrate that iDP3 enables a full-sized humanoid robot to autonomously perform skills in diverse real-world scenarios, using only data collected in the lab. Videos are available at: https://humanoid-manipulation.github.io