HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and Decision in Embodied Agents

Yibin Liu, Zhixuan Liang, Zanxin Chen, Tianxing Chen, Mengkang Hu, Wanxi Dong, Congsheng Xu, Zhaoming Han, Yusen Qin, Yao Mu

2025-08-06

HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and
Decision in Embodied Agents

Summary

This paper talks about HyCodePolicy, a system that helps robots and AI agents follow instructions better by combining code writing, understanding physical space, watching what’s happening, and fixing mistakes automatically.

What's the problem?

The problem is that existing AI systems can have trouble completing tasks because they don’t always notice when something goes wrong or know how to fix errors while they’re working.

What's the solution?

HyCodePolicy breaks instructions into smaller parts and turns them into code that guides the robot’s actions in a simulated space. While the robot works, a vision-language model watches to spot mistakes and figure out what went wrong, helping the system fix itself with little human help.

Why it matters?

This matters because it makes AI agents and robots smarter and more reliable at doing tasks on their own, which can improve automation and help in situations where people can’t always intervene.

Abstract

HyCodePolicy integrates code synthesis, geometric grounding, perceptual monitoring, and iterative repair to enhance the robustness and efficiency of embodied agent policies.

View Paper