InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
2025-06-23
Summary
This paper talks about InfiniPot-V, a new method that helps AI understand streaming videos in real time while keeping the use of computer memory fixed and low by compressing the key-value (KV) cache during video processing.
What's the problem?
The problem is that when AI processes long or live videos, the amount of memory needed keeps growing because it has to remember more and more information, which can be too much for devices like phones or robots with limited memory.
What's the solution?
The researchers created a training-free framework that automatically compresses the KV cache whenever it reaches a certain size, removing redundant information while keeping the important parts. This keeps memory use constant and allows the AI to keep understanding the video without slowing down or needing extra training.
Why it matters?
This matters because it allows AI to efficiently analyze long or live videos on devices with limited memory, like phones or drones, enabling smarter real-time video understanding applications.
Abstract
InfiniPot-V is a training-free, query-agnostic framework that compresses the key-value cache during video encoding to maintain a fixed memory cap for streaming video understanding, enhancing real-time performance and accuracy.