TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao, Yicheng Li, Yuancheng Wei, Lei Li, Shuhuai Ren, Yuanxin Liu, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu Sun
2025-04-25
Summary
This paper talks about TimeChat-Online, an AI system designed to process live video streams more efficiently by recognizing and skipping over repeated or unnecessary visual information.
What's the problem?
The problem is that when AI models watch and analyze streaming videos, they often waste a lot of time and computer power looking at frames that are almost the same, which slows everything down and uses up resources.
What's the solution?
The researchers built TimeChat-Online with a special feature called the Differential Token Drop module, which can spot and ignore about 80% of the video data that is just repeating itself. This makes the model much faster and more efficient, while still understanding what's happening in the video.
Why it matters?
This matters because it means AI can keep up with real-time video, like livestreams or security cameras, without needing super powerful computers, making it more useful and practical for everyday situations.
Abstract
TimeChat-Online, an online VideoLLM with a Differential Token Drop module, efficiently handles real-time video streaming by reducing redundant frames, achieving superior performance compared to existing models.