Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Yue Liao, Pengfei Zhou, Siyuan Huang, Donglin Yang, Shengcong Chen, Yuxin Jiang, Yue Hu, Jingbin Cai, Si Liu, Jianlan Luo, Liliang Chen, Shuicheng Yan, Maoqing Yao, Guanghui Ren

2025-08-08

Genie Envisioner: A Unified World Foundation Platform for Robotic
Manipulation

Summary

This paper talks about Genie Envisioner, a unified platform that combines learning, testing, and simulating robot actions using video and neural networks to help robots follow instructions and manipulate objects better.

What's the problem?

The problem is that current robotic systems often use separate tools for learning, testing, and simulating, which makes it hard to build robots that can understand and interact with the real world effectively and consistently.

What's the solution?

The solution was to create Genie Envisioner, which uses a large-scale video model conditioned on instructions to understand and predict robot movements, a decoder that converts this understanding into actual robot actions, and a neural simulator that helps train and evaluate these actions in a realistic way all in one integrated system.

Why it matters?

This matters because it makes it easier and faster to build robots that can follow complex instructions and perform tasks more accurately, improving their usefulness in many real-world applications like manufacturing, healthcare, and home assistance.

Abstract

Genie Envisioner integrates policy learning, evaluation, and simulation using a video diffusion model and neural simulator for instruction-driven robotic manipulation.

View Paper