Next Embedding Prediction Makes World Models Stronger
George Bredis, Nikita Balagansky, Daniil Gavrilov, Ruslan Rakhimov
2026-03-04
Summary
This paper introduces a new method called NE-Dreamer for teaching computers to learn through trial and error, specifically in situations where they don't have complete information about what's going on around them.
What's the problem?
When a computer is learning to do something, like play a game or control a robot, it needs to understand how its actions affect the world over time. This is especially hard when the computer can't see everything – imagine trying to play a game with a blurry screen! Existing methods often struggle to keep track of important information over longer periods, or they require a lot of extra help to learn effectively.
What's the solution?
NE-Dreamer solves this by using a special tool called a 'temporal transformer.' Think of it like a really good memory that helps the computer predict what will happen next based on what it has already experienced. Instead of trying to recreate what it sees (like some other methods), NE-Dreamer focuses on predicting the *important* parts of the situation as it changes, directly in a simplified representation of the world. This allows it to learn more efficiently and build a better understanding of how things work.
Why it matters?
This research shows that predicting what happens next using temporal transformers is a powerful way to teach computers to learn in complex and uncertain environments. It performs as well as, or even better than, other leading methods on challenging tasks, meaning it could be used to create more intelligent robots and AI systems that can handle real-world situations more effectively.
Abstract
Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.