TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
Yanjie Ze, Siheng Zhao, Weizhuo Wang, Angjoo Kanazawa, Rocky Duan, Pieter Abbeel, Guanya Shi, Jiajun Wu, C. Karen Liu
2025-11-05
Summary
This paper introduces a new system, TWIST2, for easily controlling and collecting data from humanoid robots, allowing them to learn complex skills more quickly.
What's the problem?
Currently, teaching humanoid robots is difficult because it requires either very precise but separate control of different body parts, or expensive and complicated motion capture technology to record how a human would perform the task. Existing methods don't allow for natural, full-body control and aren't easily scaled up to collect lots of training data.
What's the solution?
The researchers created TWIST2, which uses a VR headset (PICO4U) to track a human's movements in real-time and translate those movements directly to the robot. They added a small, inexpensive neck mechanism to the robot to give it a human-like viewpoint. This allows a person to control the robot's entire body naturally, and quickly gather a large amount of demonstration data – they collected 100 successful demonstrations in just 15 minutes. They then used this data to train the robot to perform tasks on its own using a new control system that focuses on what the robot *sees*.
Why it matters?
This work is important because it makes it much easier and cheaper to develop and train humanoid robots. By providing a simple way to collect large datasets of human demonstrations, and a new way to use that data to control the robot, TWIST2 opens the door to more advanced and capable humanoid robots that can perform a wider range of tasks.
Abstract
Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expensive motion capture setups. We introduce TWIST2, a portable, mocap-free humanoid teleoperation and data collection system that preserves full whole-body control while advancing scalability. Our system leverages PICO4U VR for obtaining real-time whole-body human motions, with a custom 2-DoF robot neck (cost around $250) for egocentric vision, enabling holistic human-to-humanoid control. We demonstrate long-horizon dexterous and mobile humanoid skills and we can collect 100 demonstrations in 15 minutes with an almost 100% success rate. Building on this pipeline, we propose a hierarchical visuomotor policy framework that autonomously controls the full humanoid body based on egocentric vision. Our visuomotor policy successfully demonstrates whole-body dexterous manipulation and dynamic kicking tasks. The entire system is fully reproducible and open-sourced at https://yanjieze.com/TWIST2 . Our collected dataset is also open-sourced at https://twist-data.github.io .