NitroGen: An Open Foundation Model for Generalist Gaming Agents
Loïc Magne, Anas Awadalla, Guanzhi Wang, Yinzhen Xu, Joshua Belofsky, Fengyuan Hu, Joohwan Kim, Ludwig Schmidt, Georgia Gkioxari, Jan Kautz, Yisong Yue, Yejin Choi, Yuke Zhu, Linxi "Jim" Fan
2026-01-07
Summary
This paper introduces NitroGen, a new artificial intelligence model designed to play many different video games. It's a 'foundation model,' meaning it's trained on a huge amount of gaming data to become generally good at gaming, rather than being specifically programmed for one game.
What's the problem?
Creating AI that can play a wide variety of games is really hard. Most AI is built to master a single game, and struggles when faced with new rules, controls, or even just different visual styles. There wasn't a good way to train an AI to be a generally skilled gamer, and there wasn't a standard way to test how well an AI could adapt to games it hadn't seen before.
What's the solution?
The researchers built NitroGen by first collecting 40,000 hours of gameplay footage from over 1,000 different games. They then used this footage to teach the AI to connect what it *sees* on the screen with the *actions* a player takes. This is called 'behavior cloning.' They also created a set of games to test how well NitroGen could play games it wasn't specifically trained on, measuring its ability to generalize. Finally, they released all the data and the model itself so other researchers can build on their work.
Why it matters?
This work is important because it's a big step towards creating AI agents that can truly understand and interact with complex environments like video games. This isn't just about better game-playing AI; the skills learned here – understanding visuals, making decisions, and controlling characters – could be applied to real-world robotics and other areas where AI needs to act in the physical world. The release of the data and model will help accelerate research in this field.
Abstract
We introduce NitroGen, a vision-action foundation model for generalist gaming agents that is trained on 40,000 hours of gameplay videos across more than 1,000 games. We incorporate three key ingredients: 1) an internet-scale video-action dataset constructed by automatically extracting player actions from publicly available gameplay videos, 2) a multi-game benchmark environment that can measure cross-game generalization, and 3) a unified vision-action model trained with large-scale behavior cloning. NitroGen exhibits strong competence across diverse domains, including combat encounters in 3D action games, high-precision control in 2D platformers, and exploration in procedurally generated worlds. It transfers effectively to unseen games, achieving up to 52% relative improvement in task success rates over models trained from scratch. We release the dataset, evaluation suite, and model weights to advance research on generalist embodied agents.