OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

2025-10-10

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Summary

This paper focuses on making it easier to teach humanoid robots complex movements by using recordings of humans performing those movements. It's about improving how we transfer those human motions to robots so they can learn more effectively.

What's the problem?

Currently, when researchers try to copy human movements onto robots, there are issues. Because humans and robots have different body shapes and ways of moving, the copied motions often look unnatural and even physically impossible for the robot – like the robot’s feet sliding around or parts of its body clipping through objects. Existing methods also don't account for how humans interact with objects and their surroundings, which is crucial for realistic and useful movements.

What's the solution?

The researchers developed a new system called OmniRetarget. This system creates a kind of digital 'mesh' that maps out all the important relationships between the robot, the ground it's walking on, and any objects it's interacting with. OmniRetarget carefully adjusts the human motion to fit the robot’s body while making sure the robot stays balanced and doesn’t end up in awkward positions. It also preserves how the human interacts with things, allowing the robot to learn those interactions too. They can generate a lot of training data from just one human demonstration, adapting it for different robots, terrains, and objects.

Why it matters?

This work is important because it allows robots to learn complex skills, like parkour or manipulating objects while moving, much more easily and reliably. By creating higher-quality training data, robots can learn these skills with fewer instructions and less fine-tuning, making them more adaptable and useful in real-world situations. The fact that they could train a robot to do complex tasks with only a few simple rules shows how effective this new method is.

Abstract

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.

View Paper