< Explain other AI papers

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control

NVIDIA, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu

2025-03-19

Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal
  Control

Summary

This paper introduces Cosmos-Transfer, a new AI system that can create realistic-looking simulated worlds based on different types of input, like drawings or depth maps.

What's the problem?

Creating realistic simulations is hard, especially when you want to control specific details of the world.

What's the solution?

Cosmos-Transfer allows users to control various aspects of the generated world by using different types of input to guide the AI. It can also be scaled up to generate these worlds in real-time.

Why it matters?

This work is important because it can be used to create training data for robots and self-driving cars and could help improve simulations used in various fields.

Abstract

We introduce Cosmos-Transfer, a conditional world generation model that can generate world simulations based on multiple spatial control inputs of various modalities such as segmentation, depth, and edge. In the design, the spatial conditional scheme is adaptive and customizable. It allows weighting different conditional inputs differently at different spatial locations. This enables highly controllable world generation and finds use in various world-to-world transfer use cases, including Sim2Real. We conduct extensive evaluations to analyze the proposed model and demonstrate its applications for Physical AI, including robotics Sim2Real and autonomous vehicle data enrichment. We further demonstrate an inference scaling strategy to achieve real-time world generation with an NVIDIA GB200 NVL72 rack. To help accelerate research development in the field, we open-source our models and code at https://github.com/nvidia-cosmos/cosmos-transfer1.