Vid2World: Crafting Video Diffusion Models to Interactive World Models

Siqiao Huang, Jialong Wu, Qixing Zhou, Shangchen Miao, Mingsheng Long

2025-05-22

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Summary

This paper talks about Vid2World, a new system that takes AI models trained to generate videos and turns them into interactive models that can simulate and control actions in virtual worlds.

What's the problem?

Most video AI models can only create videos passively and can't interact with or control what happens in a virtual environment, which limits their usefulness for things like games or training robots.

What's the solution?

The researchers modified existing video diffusion models by adding features that let the models understand cause and effect and follow specific action instructions, making them capable of not just showing what could happen, but also guiding actions in complex virtual settings.

Why it matters?

This matters because it opens the door to more advanced simulations, smarter virtual assistants, and better training tools for robots and AI, all of which need to interact with and control their environments, not just watch them.

Abstract

Vid2World repurposes pre-trained video diffusion models into interactive world models via causalization and action guidance, enhancing action controllability and scalability in complex environments.

View Paper