LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

Lihuang Chen, Xiangyu Luo, Jun Meng

2025-12-15

LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator

Summary

This paper introduces LEO-RobotAgent, a new system that lets robots understand and follow instructions given in natural language. It's designed to work with many different kinds of robots and handle complicated, unexpected tasks.

What's the problem?

Currently, most robot systems that use large language models (LLMs) are built for very specific tasks and only work with one type of robot. These systems are often complicated and don't easily adapt to new situations or different robots. It's hard to get robots to truly understand what a person wants them to do and work *with* people effectively.

What's the solution?

LEO-RobotAgent simplifies things by creating a clear, easy-to-understand framework for LLMs to control robots. It's like giving the LLM a set of tools it can choose from to complete a task. The system also allows for back-and-forth communication between the robot and a human, so they can collaborate. The researchers made it so this framework can be used with drones, robotic arms, and wheeled robots without major changes.

Why it matters?

This work is important because it moves us closer to robots that are truly helpful and adaptable. By making it easier to program robots with language and allowing them to work with humans, we can unlock a wider range of applications for robots in everyday life, from assisting in homes to helping in complex industrial settings.

Abstract

We propose LEO-RobotAgent, a general-purpose language-driven intelligent agent framework for robots. Under this framework, LLMs can operate different types of robots to complete unpredictable complex tasks across various scenarios. This framework features strong generalization, robustness, and efficiency. The application-level system built around it can fully enhance bidirectional human-robot intent understanding and lower the threshold for human-robot interaction. Regarding robot task planning, the vast majority of existing studies focus on the application of large models in single-task scenarios and for single robot types. These algorithms often have complex structures and lack generalizability. Thus, the proposed LEO-RobotAgent framework is designed with a streamlined structure as much as possible, enabling large models to independently think, plan, and act within this clear framework. We provide a modular and easily registrable toolset, allowing large models to flexibly call various tools to meet different requirements. Meanwhile, the framework incorporates a human-robot interaction mechanism, enabling the algorithm to collaborate with humans like a partner. Experiments have verified that this framework can be easily adapted to mainstream robot platforms including unmanned aerial vehicles (UAVs), robotic arms, and wheeled robot, and efficiently execute a variety of carefully designed tasks with different complexity levels. Our code is available at https://github.com/LegendLeoChen/LEO-RobotAgent.

View Paper