DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto

2024-11-08

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Summary

This paper introduces DynaMem, a new system that helps robots understand and interact with changing environments by using a dynamic memory that updates as they explore.

What's the problem?

Most current robotic systems assume that the environment is static and unchanging, which is not realistic since things can move or change due to human activity or the robot's actions. This limitation makes it difficult for robots to perform tasks effectively in real-world scenarios where conditions are constantly evolving.

What's the solution?

DynaMem uses a dynamic spatio-semantic memory to keep track of the robot's surroundings in real-time. It builds a 3D map of the environment that updates automatically as the robot moves and observes changes. This allows the robot to locate objects based on natural language queries, even if those objects were not previously in its memory. The researchers tested DynaMem on various robots and found that it significantly improved their ability to pick up and drop non-stationary objects, achieving a success rate of 70%, which is more than double the performance of older systems.

Why it matters?

This research is important because it enhances how robots can operate in everyday environments, making them more useful for tasks like delivery, assistance, or cleaning. By allowing robots to adapt to changes in their surroundings, DynaMem could lead to more effective and versatile robotic applications in homes, offices, and public spaces.

Abstract

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/

View Paper