ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar
2024-07-02

Summary
This paper talks about ROS-LLM, a new framework that makes it easier for non-experts to program robots using natural language. It combines the Robot Operating System (ROS) with advanced AI models to allow users to communicate tasks to robots in a simple way.
What's the problem?
Programming robots typically requires specialized knowledge and expertise, which can be a barrier for many people who want to use robots for various tasks. This complexity makes it hard for non-experts to interact with and control robots effectively, limiting their use in everyday situations.
What's the solution?
To solve this problem, the authors developed the ROS-LLM framework, which allows users to give instructions to robots using everyday language through a chat interface. The system integrates large language models (LLMs) that can understand and process these instructions. It also supports different ways of organizing tasks, like sequences and behavior trees, and includes features like imitation learning, where users can teach robots new actions by demonstrating them. This makes it easier for anyone to program robots without needing deep technical skills.
Why it matters?
This research is important because it opens up the world of robotics to a wider audience by making programming more accessible. By allowing non-experts to control robots with simple language commands, ROS-LLM can lead to more innovative uses of robots in homes, schools, and industries. This could revolutionize how we interact with technology and make robotic assistance more common in our daily lives.
Abstract
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.