Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Haritheja Etukuru, Norihito Naka, Zijin Hu, Seungjae Lee, Julian Mehu, Aaron Edsinger, Chris Paxton, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah

2024-09-10

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Summary

This paper talks about Robot Utility Models (RUMs), a new approach that allows robots to work effectively in new environments without needing extra training.

What's the problem?

Robots usually need to be fine-tuned or retrained every time they are placed in a different environment, which can be time-consuming and inefficient. This is a problem because it limits their ability to adapt quickly to new tasks or settings, unlike language or vision models that can often operate without additional training.

What's the solution?

The authors introduce RUMs, which enable robots to generalize their skills to new environments without any fine-tuning. They developed tools to quickly collect data for various tasks and trained the robots using multi-modal imitation learning. The robots were tested on tasks like opening cabinet doors and picking up objects, achieving a 90% success rate in unfamiliar settings without needing prior exposure to those specific environments or objects.

Why it matters?

This research is important because it enhances the flexibility and efficiency of robots, allowing them to perform tasks in different situations without extensive retraining. This could lead to more practical applications of robots in everyday life, such as in homes, workplaces, and various industries.

Abstract

Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. Our code, data, models, hardware designs, as well as our experiment and deployment videos are open sourced and can be found on our project website: https://robotutilitymodels.com

View Paper