Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots

Yushi Wang, Changsheng Luo, Penghui Chen, Jianran Liu, Weijian Sun, Tong Guo, Kechang Yang, Biao Hu, Yangang Zhang, Mingguo Zhao

2025-11-07

Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots

Summary

This research focuses on making humanoid robots better at playing soccer, which is a really tough challenge for artificial intelligence because it requires robots to quickly react to what they see and move accordingly.

What's the problem?

Currently, robots trying to play soccer often struggle because their 'brains' are split into separate parts for seeing, thinking, and moving. This causes delays and makes their actions look clumsy and uncoordinated, especially when things are happening fast. Plus, real-world cameras aren't perfect, adding another layer of difficulty.

What's the solution?

The researchers created a single, unified 'brain' for the robot using a technique called reinforcement learning. This 'brain' directly connects what the robot sees with how it moves, allowing for much faster reactions. They also used a special system that simulates realistic camera images, helping the robot learn to deal with imperfect vision and coordinate its actions effectively. It's like teaching the robot to understand what it's seeing and how to respond all at once.

Why it matters?

This work is important because it shows a way to build robots that can react quickly and intelligently to the real world, not just in controlled environments. Successfully applying this to soccer, a dynamic and complex game, demonstrates progress towards robots that can handle similar challenges in other areas like search and rescue, or even assisting people in everyday tasks.

Abstract

Humanoid soccer poses a representative challenge for embodied intelligence, requiring robots to operate within a tightly coupled perception-action loop. However, existing systems typically rely on decoupled modules, resulting in delayed responses and incoherent behaviors in dynamic environments, while real-world perceptual limitations further exacerbate these issues. In this work, we present a unified reinforcement learning-based controller that enables humanoid robots to acquire reactive soccer skills through the direct integration of visual perception and motion control. Our approach extends Adversarial Motion Priors to perceptual settings in real-world dynamic environments, bridging motion imitation and visually grounded dynamic control. We introduce an encoder-decoder architecture combined with a virtual perception system that models real-world visual characteristics, allowing the policy to recover privileged states from imperfect observations and establish active coordination between perception and action. The resulting controller demonstrates strong reactivity, consistently executing coherent and robust soccer behaviors across various scenarios, including real RoboCup matches.

View Paper