UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations

Hanjung Kim, Jaehyun Kang, Hyolim Kang, Meedeum Cho, Seon Joo Kim, Youngwoon Lee

2025-05-15

UniSkill: Imitating Human Videos via Cross-Embodiment Skill
Representations

Summary

This paper talks about UniSkill, a new AI system that can watch videos of humans doing different tasks and then teach robots to do those same tasks, even if the robot's body is very different from a human's.

What's the problem?

The problem is that it's hard for robots to learn new skills just by watching humans, because robots and humans usually have different shapes, sizes, and ways of moving, so what works for a person doesn't always work for a robot.

What's the solution?

The researchers developed UniSkill, which learns the core idea of a skill from videos, no matter who or what is performing it. This means the AI can understand what needs to be done and then help robots copy those skills, both in computer simulations and in real life, without needing a lot of labeled training data.

Why it matters?

This matters because it makes it much easier and faster to teach robots new things just by showing them videos, which could help in factories, hospitals, or even at home, making robots more useful and adaptable.

Abstract

UniSkill learns embodiment-agnostic skill representations from unlabeled cross-embodiment video data, enabling human skill transfer to robots in both simulation and real-world settings.

View Paper