Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang

2024-12-06

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Summary

This paper introduces Code-as-Monitor (CaM), a new system that helps robots detect and prevent failures in real-time by using visual programming and constraint satisfaction.

What's the problem?

In robotic systems, it's important to quickly identify and address unexpected failures while also preventing problems that can be anticipated. However, existing methods often struggle to do both at the same time, leading to inefficiencies and potential errors in robotic operations.

What's the solution?

CaM solves this problem by using a vision-language model (VLM) to monitor robotic tasks and detect failures as they happen. It treats the detection of failures as a set of problems that need to be solved based on spatial and temporal constraints. The system simplifies this process by breaking down complex visual information into smaller, manageable geometric elements. This allows CaM to effectively track and respond to issues in real-time, making it capable of both reactive and proactive failure detection. The researchers conducted experiments showing that CaM significantly improves success rates and reduces the time needed for execution compared to previous methods.

Why it matters?

This research is important because it enhances the reliability and efficiency of robotic systems, making them better at handling unexpected challenges in dynamic environments. By improving how robots detect and respond to failures, CaM can lead to safer and more effective automation in various fields, such as manufacturing, healthcare, and service industries.

Abstract

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

View Paper