InfiniPot-V is a training-free, query-agnostic framework that compresses the key-value cache during video encoding to maintain a fixed memory cap for streaming video understanding, enhancing real-time performance and accuracy.

This paper talks about InfiniPot-V, a new method that helps AI understand streaming videos in real time while keeping the use of computer memory fixed and low by compressing the key-value (KV) cache during video processing.

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract