Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang
2024-12-12

Summary
This paper presents a new method called the Self-Refining Data Flywheel (SRDF) that helps create high-quality data for training AI systems to navigate using natural language instructions, without needing human input.
What's the problem?
Training AI to follow complex navigation instructions is difficult because it requires a lot of high-quality data. Creating this data usually involves human effort, which can be time-consuming and expensive. Existing methods often produce low-quality data that doesn't help the AI learn effectively.
What's the solution?
The authors introduce the SRDF, which works by having two models cooperate: one generates initial navigation instructions, and the other tests these instructions to see how well they work. The successful examples are then used to improve the instruction generator, creating a cycle of continuous improvement. This process allows the AI to refine its data automatically, leading to better performance over time.
Why it matters?
This research is important because it reduces the need for human involvement in data creation, making it easier and faster to train AI systems for navigation tasks. By improving the quality of training data through self-refinement, the SRDF enhances the ability of AI to understand and follow complex instructions, which can lead to better applications in robotics, virtual assistants, and more.
Abstract
Creating high-quality data for training robust language-instructed agents is a long-lasting challenge in embodied AI. In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation. Specifically, SRDF starts with using a base generator to create an initial data pool for training a base navigator, followed by applying the trained navigator to filter the data pool. This leads to higher-fidelity data to train a better generator, which can, in turn, produce higher-quality data for training the next-round navigator. Such a flywheel establishes a data self-refining process, yielding a continuously improved and highly effective dataset for large-scale language-guided navigation learning. Our experiments demonstrate that after several flywheel rounds, the navigator elevates the performance boundary from 70% to 78% SPL on the classic R2R test set, surpassing human performance (76%) for the first time. Meanwhile, this process results in a superior generator, evidenced by a SPICE increase from 23.5 to 26.2, better than all previous VLN instruction generation methods. Finally, we demonstrate the scalability of our method through increasing environment and instruction diversity, and the generalization ability of our pre-trained navigator across various downstream navigation tasks, surpassing state-of-the-art methods by a large margin in all cases.