GRAPE: Generalizing Robot Policy via Preference Alignment
Zijian Zhang, Kaiyuan Zheng, Zhaorun Chen, Joel Jang, Yi Li, Chaoqi Wang, Mingyu Ding, Dieter Fox, Huaxiu Yao
2024-12-02

Summary
This paper introduces GRAPE, a new method designed to help robots learn to perform tasks better by aligning their actions with user preferences, making them more adaptable to different situations.
What's the problem?
Many existing robot models struggle to adapt to new tasks they haven't been specifically trained on. They often rely on mimicking successful examples from experts, which can lead to problems when faced with different scenarios. This makes it hard for robots to perform well in various tasks that require efficiency, safety, or successful completion.
What's the solution?
GRAPE addresses these challenges by allowing robots to learn from both successful and failed attempts at tasks. It breaks down complex tasks into simpler stages and uses a system that helps the robot understand user preferences for different objectives, like being safe or efficient. This approach improves the robot's ability to generalize its skills across different tasks without needing extra training.
Why it matters?
This research is important because it enhances how robots can learn and adapt to various tasks in real-world environments. By improving their ability to align with user preferences and perform effectively across different situations, GRAPE can lead to more capable robots that can assist in many fields, such as healthcare, manufacturing, and home automation.
Abstract
Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-language model. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 60.36%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 44.31% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/