Fidelity-Aware Data Composition for Robust Robot Generalization
Zizhao Tong, Di Chen, Sicheng Hu, Hongwei Fan, Liliang Chen, Guanghui Ren, Hao Tang, Hao Dong, Ling Shao
2025-10-10
Summary
This research focuses on making robots more reliable when they encounter situations they haven't specifically been trained for, a concept called out-of-distribution generalization.
What's the problem?
Robots are often trained using lots of images or videos, but if all this training data looks very similar, the robot learns to rely on superficial cues instead of understanding the underlying task, leading to failures in new environments. Simply adding more diverse, but potentially unrealistic, training data doesn't always fix this because it can confuse the robot and make learning harder; it's about *how* you combine real and fake data.
What's the solution?
The researchers developed a method called Coherent Information Fidelity Tuning, or CIFT. This method doesn't just create more training data, it carefully controls *how* real and artificially generated data are mixed together during training. They figured out a way to measure how much 'useful information' is in the data and use that to find the sweet spot where adding fake data helps without messing up the robot's learning process. They also created a tool, Multi-View Video Augmentation, to generate realistic training data for this process.
Why it matters?
This work is important because it shows that simply creating more diverse training data isn't enough for building robust robots. It's crucial to focus on the *quality* and *coherence* of the data, ensuring that the robot learns meaningful information and can adapt to new situations. This approach significantly improved the success rate of robots in unfamiliar environments, bringing us closer to truly general-purpose robots.
Abstract
Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.