Aether: Geometric-Aware Unified World Modeling
Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He
2025-03-25
Summary
This paper is about creating AI that can understand and interact with the world around it in a way that's similar to how humans do.
What's the problem?
AI systems often struggle to understand the physical world and how things move and interact within it.
What's the solution?
The researchers developed a new AI framework called Aether that combines 3D reconstruction, video prediction, and planning, allowing the AI to learn about the world and make decisions about how to act within it.
Why it matters?
This work matters because it could lead to AI systems that are better at navigating and interacting with the real world, such as robots that can perform complex tasks or virtual assistants that can understand and respond to their environment.
Abstract
The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates unprecedented synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Remarkably, even without real-world data, its reconstruction performance far exceeds that of domain-specific models. Additionally, Aether leverages a geometry-informed action space to seamlessly translate predictions into actions, enabling effective autonomous trajectory planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.