ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Kaijun Wang, Liqin Lu, Mingyu Liu, Jianuo Jiang, Zeju Li, Bolin Zhang, Wancai Zheng, Xinyi Yu, Hao Chen, Chunhua Shen

2025-08-25

ODYSSEY: Open-World Quadrupeds Exploration and Manipulation for Long-Horizon Tasks

Summary

This research introduces ODYSSEY, a new system that allows robots – specifically four-legged robots with arms – to perform complex tasks based on spoken or written instructions, even in messy, real-world environments.

What's the problem?

Getting robots to reliably follow language commands to manipulate objects over long periods is really hard. Existing systems work well on tables, but struggle when the robot needs to move around and deal with uneven ground. They also aren't very good at adapting to different arrangements of objects or coordinating movement of the robot's body and arm at the same time, which is essential for not falling over while working.

What's the solution?

The researchers created ODYSSEY, which combines a smart planner that understands language with a control system that manages the robot’s legs and arm. The planner breaks down instructions into smaller steps and uses vision to understand what the robot 'sees'. The control system allows the robot to move smoothly and maintain balance while manipulating objects. They also created a set of tests to measure how well these robots can perform these tasks in different situations, both in simulations and in the real world.

Why it matters?

This work is a big step towards creating robots that can be genuinely helpful assistants in our everyday lives. Imagine a robot that can understand your request to 'bring me the red mug from the kitchen counter' and actually do it, even if the kitchen is cluttered and the floor isn't perfectly level. This research makes that kind of robot more achievable.

Abstract

Language-guided long-horizon mobile manipulation has long been a grand challenge in embodied semantic reasoning, generalizable manipulation, and adaptive locomotion. Three fundamental limitations hinder progress: First, although large language models have improved spatial reasoning and task planning through semantic priors, existing implementations remain confined to tabletop scenarios, failing to address the constrained perception and limited actuation ranges of mobile platforms. Second, current manipulation strategies exhibit insufficient generalization when confronted with the diverse object configurations encountered in open-world environments. Third, while crucial for practical deployment, the dual requirement of maintaining high platform maneuverability alongside precise end-effector control in unstructured settings remains understudied. In this work, we present ODYSSEY, a unified mobile manipulation framework for agile quadruped robots equipped with manipulators, which seamlessly integrates high-level task planning with low-level whole-body control. To address the challenge of egocentric perception in language-conditioned tasks, we introduce a hierarchical planner powered by a vision-language model, enabling long-horizon instruction decomposition and precise action execution. At the control level, our novel whole-body policy achieves robust coordination across challenging terrains. We further present the first benchmark for long-horizon mobile manipulation, evaluating diverse indoor and outdoor scenarios. Through successful sim-to-real transfer, we demonstrate the system's generalization and robustness in real-world deployments, underscoring the practicality of legged manipulators in unstructured environments. Our work advances the feasibility of generalized robotic assistants capable of complex, dynamic tasks. Our project page: https://kaijwang.github.io/odyssey.github.io/

View Paper