Token Bottleneck: One Token to Remember Dynamics

Taekyung Kim, Dongyoon Han, Byeongho Heo, Jeongeun Park, Sangdoo Yun

2025-07-11

Token Bottleneck: One Token to Remember Dynamics

Summary

This paper talks about Token Bottleneck (ToBo), a new way for AI to create small, meaningful summaries of what it sees in videos, helping it understand how scenes change over time.

What's the problem?

AI models usually struggle to keep track of dynamic scenes and changes between video frames because they process many small pieces separately, which makes it hard to understand the flow of events.

What's the solution?

The researchers designed ToBo to squeeze a whole scene into one compact token that holds important info, then use this token with just a few hints from the next scene to predict what happens next. This forces the AI to pay attention to how things change over time in a smart, efficient way.

Why it matters?

This matters because it helps AI better understand videos and sequences, which can improve tasks like tracking objects, helping robots understand their environment, and making AI more effective in real-world applications.

Abstract

ToBo, a self-supervised learning pipeline, generates compact and temporally aware visual representations by encoding scenes into a bottleneck token and predicting subsequent scenes with minimal hints, demonstrating superior performance in sequential tasks.

View Paper