GenEx: Generating an Explorable World
Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen
2024-12-16

Summary
This paper talks about GenEx, a new system that allows AI to create and explore detailed 3D environments based on just a single image, simulating how humans navigate and understand the world.
What's the problem?
Understanding and navigating 3D spaces is a big challenge for AI. Traditional methods require a lot of data and often struggle to create realistic environments. They also don't effectively mimic how humans can imagine and explore unseen parts of their surroundings.
What's the solution?
GenEx addresses this problem by using a generative approach that builds entire 3D worlds from minimal input, like a single RGB image. It combines advanced techniques to create consistent and immersive environments that AI agents can explore. The system uses a model that generates panoramic video streams, allowing the AI to interact with the environment as if it were real. This enables the AI to plan its actions based on both what it sees and what it can imagine about the world around it.
Why it matters?
This research is important because it represents a significant advancement in how AI can understand and interact with complex environments. By allowing AI to generate and explore virtual worlds dynamically, GenEx opens up new possibilities for applications in gaming, virtual reality, training simulations, and robotics, making interactions with digital spaces more intuitive and human-like.
Abstract
Understanding, navigating, and exploring the 3D physical real world has long been a central challenge in the development of artificial intelligence. In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments. GenEx generates an entire 3D-consistent imaginative environment from as little as a single RGB image, bringing it to life through panoramic video streams. Leveraging scalable 3D world data curated from Unreal Engine, our generative model is rounded in the physical world. It captures a continuous 360-degree environment with little effort, offering a boundless landscape for AI agents to explore and interact with. GenEx achieves high-quality world generation, robust loop consistency over long trajectories, and demonstrates strong 3D capabilities such as consistency and active 3D mapping. Powered by generative imagination of the world, GPT-assisted agents are equipped to perform complex embodied tasks, including both goal-agnostic exploration and goal-driven navigation. These agents utilize predictive expectation regarding unseen parts of the physical world to refine their beliefs, simulate different outcomes based on potential decisions, and make more informed choices. In summary, we demonstrate that GenEx provides a transformative platform for advancing embodied AI in imaginative spaces and brings potential for extending these capabilities to real-world exploration.