GameFactory: Creating New Games with Generative Interactive Videos
Jiwen Yu, Yiran Qin, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu
2025-01-21

Summary
This paper talks about GameFactory, a new AI system that can create entirely new video games by generating interactive videos. It's like having a super-smart game designer that can come up with endless new game worlds and scenarios just by watching videos of other games and real-world scenes.
What's the problem?
Current AI systems for making games are limited because they can only create content that looks like the games they were trained on. It's like they're stuck copying existing game styles and can't come up with truly new and different game worlds. This limits how creative and diverse AI-generated games can be.
What's the solution?
The researchers created GameFactory, which uses AI models that have watched tons of different videos, not just game footage. They taught GameFactory to understand both how games look and how players control them, but in a way that doesn't limit it to one game style. They used Minecraft as a starting point but made sure GameFactory could go way beyond just Minecraft-style games. They also made it so GameFactory can create super long game videos that players can actually interact with, not just short clips.
Why it matters?
This matters because it could completely change how video games are made. Instead of teams of people spending years designing every little detail of a game world, AI could generate new, unique game environments instantly. This could lead to games that are always fresh and surprising, with infinite new worlds to explore. It could make game development faster and cheaper, allowing for more creative and experimental games. Plus, the technology behind GameFactory could be used for other things too, like creating virtual training simulations or even helping design real-world spaces.
Abstract
Generative game engines have the potential to revolutionize game development by autonomously creating new content and reducing manual workload. However, existing video-based game generation methods fail to address the critical challenge of scene generalization, limiting their applicability to existing games with fixed styles and scenes. In this paper, we present GameFactory, a framework focused on exploring scene generalization in game video generation. To enable the creation of entirely new and diverse games, we leverage pre-trained video diffusion models trained on open-domain video data. To bridge the domain gap between open-domain priors and small-scale game dataset, we propose a multi-phase training strategy that decouples game style learning from action control, preserving open-domain generalization while achieving action controllability. Using Minecraft as our data source, we release GF-Minecraft, a high-quality and diversity action-annotated video dataset for research. Furthermore, we extend our framework to enable autoregressive action-controllable game video generation, allowing the production of unlimited-length interactive game videos. Experimental results demonstrate that GameFactory effectively generates open-domain, diverse, and action-controllable game videos, representing a significant step forward in AI-driven game generation. Our dataset and project page are publicly available at https://vvictoryuki.github.io/gamefactory/.