Efficient World Models with Context-Aware Tokenization

Vincent Micheli, Eloi Alonso, François Fleuret

2024-07-01

Efficient World Models with Context-Aware Tokenization

Summary

This paper talks about Delta-IRIS, a new type of agent designed to improve how machines learn to understand and predict their environments using a method called world modeling. It focuses on making this process faster and more efficient by using advanced techniques in reinforcement learning.

What's the problem?

Deep reinforcement learning (RL) methods are powerful for training AI agents to make decisions, but they face challenges when trying to scale up. Traditional methods often require a lot of computing power because they need to process long sequences of data (tokens) to accurately simulate environments. This can slow down training and make it less practical for complex tasks.

What's the solution?

To address these issues, the authors introduced Delta-IRIS, which uses a new architecture that combines a discrete autoencoder and an autoregressive transformer. The autoencoder helps the model focus on the changes (or 'deltas') between different time steps, while the transformer predicts future changes based on the current state of the environment. This approach allows Delta-IRIS to train much faster than previous models while still achieving high performance in various tasks, as demonstrated in the Crafter benchmark.

Why it matters?

This research is important because it makes it easier and quicker for AI agents to learn about their environments, which can lead to better performance in real-world applications. By improving the efficiency of world modeling in reinforcement learning, Delta-IRIS can help develop smarter AI systems that can be used in robotics, gaming, and other fields where understanding complex environments is crucial.

Abstract

Scaling up deep Reinforcement Learning (RL) methods presents a significant challenge. Following developments in generative modelling, model-based RL positions itself as a strong contender. Recent advances in sequence modelling have led to effective transformer-based world models, albeit at the price of heavy computations due to the long sequences of tokens required to accurately simulate environments. In this work, we propose Delta-IRIS, a new agent with a world model architecture composed of a discrete autoencoder that encodes stochastic deltas between time steps and an autoregressive transformer that predicts future deltas by summarizing the current state of the world with continuous tokens. In the Crafter benchmark, Delta-IRIS sets a new state of the art at multiple frame budgets, while being an order of magnitude faster to train than previous attention-based approaches. We release our code and models at https://github.com/vmicheli/delta-iris.

View Paper