< Explain other AI papers

Video Occupancy Models

Manan Tomar, Philippe Hansen-Estruch, Philip Bachman, Alex Lamb, John Langford, Matthew E. Taylor, Sergey Levine

2024-07-16

Video Occupancy Models

Summary

This paper introduces Video Occupancy Models (VOCs), a new type of model designed to predict future video states for controlling tasks without needing to analyze each pixel individually.

What's the problem?

In video prediction, traditional models often require a lot of computational power and time because they predict every single pixel in the video frame by frame. This can be inefficient, especially when trying to use these predictions for tasks like controlling robots or other applications that rely on quick decision-making.

What's the solution?

VOCs operate in a compact latent space, which means they summarize the important information from the video without focusing on every pixel. Instead of predicting multiple future frames step by step, VOCs can predict the overall distribution of future states all at once. This makes them faster and more efficient for downstream control tasks, such as robotics or automated systems.

Why it matters?

This research is important because it simplifies the process of video prediction, making it easier and quicker to use in practical applications. By improving how we model video data, VOCs can enhance the performance of systems that rely on video input, such as autonomous vehicles or robots, ultimately leading to better technology in various fields.

Abstract

We introduce a new family of video prediction models designed to support downstream control tasks. We call these models Video Occupancy models (VOCs). VOCs operate in a compact latent space, thus avoiding the need to make predictions about individual pixels. Unlike prior latent-space world models, VOCs directly predict the discounted distribution of future states in a single step, thus avoiding the need for multistep roll-outs. We show that both properties are beneficial when building predictive models of video for use in downstream control. Code is available at https://github.com/manantomar/video-occupancy-models{github.com/manantomar/video-occupancy-models}.