SIMA 2: A Generalist Embodied Agent for Virtual Worlds
SIMA team, Adrian Bolton, Alexander Lerchner, Alexandra Cordell, Alexandre Moufarek, Andrew Bolt, Andrew Lampinen, Anna Mitenkova, Arne Olav Hallingstad, Bojan Vujatovic, Bonnie Li, Cong Lu, Daan Wierstra, Daniel P. Sawyer, Daniel Slater, David Reichert, Davide Vercelli, Demis Hassabis, Drew A. Hudson, Duncan Williams, Ed Hirst, Fabio Pardo
2025-12-05
Summary
This paper introduces SIMA 2, a computer program designed to act like a person within 3D virtual worlds. It's a big improvement over previous versions because it can understand and follow much more complex instructions, and even learn new things on its own.
What's the problem?
Previous attempts at creating 'embodied agents' – programs that can interact with a virtual environment – were limited. They could only handle simple commands and struggled to understand more complex goals or adapt to new situations. Essentially, they weren't very good at acting like a helpful partner in a virtual world, and couldn't learn independently.
What's the solution?
The researchers built SIMA 2 using a powerful language model called Gemini. This allows SIMA 2 to not only understand language but also images, and to reason about what needs to be done to achieve a goal. It can have a conversation with a user, follow complicated instructions, and even create its own tasks and reward itself for completing them, allowing it to learn new skills without direct programming. They tested it in many different virtual games and environments.
Why it matters?
This work is important because it shows we're getting closer to creating truly versatile and intelligent agents that can operate in both virtual and real-world environments. Imagine robots that can understand what you want and help you with tasks, or virtual assistants that can actually *do* things in a 3D world, not just respond to questions. SIMA 2 is a step towards making that a reality.
Abstract
We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.