ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Jian Hu, Dimitrios Korkinof, Shaogang Gong, Mariano Beguerisse-Diaz

2025-04-25

ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Summary

This paper talks about ViSMaP, a new AI system that can watch really long videos and create short, useful summaries without needing humans to label or explain what’s happening in every part of the video.

What's the problem?

The problem is that making summaries of hour-long videos usually takes a lot of time and effort because people have to watch the whole thing and write down the important parts. This is expensive and not practical for huge amounts of video, like on YouTube or in security footage.

What's the solution?

The researchers built ViSMaP, which uses a technique called meta-prompting with large language models. It learns from short videos that already have descriptions and then applies what it learns to summarize much longer videos, all without needing extra human help.

Why it matters?

This matters because it makes it much easier and cheaper to quickly understand the main points of long videos, which is super helpful for students, teachers, businesses, and anyone who wants to save time while still getting the important information.

Abstract

ViSMap generates unsupervised summaries of long videos using meta-prompting with LLMs, leveraging descriptions from short videos to reduce reliance on expensive annotations.

View Paper