< Explain other AI papers

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

Boyang Wang, Guangyi Xu, Zhipeng Tang, Jiahui Zhang, Zezhou Cheng

2026-04-28

OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer

Summary

This paper focuses on shot boundary detection, which is the process of automatically figuring out where one scene ends and another begins in a video.

What's the problem?

Current methods for detecting these scene changes aren't very good at pinpointing *exactly* where the cuts are, especially during tricky transitions. They often miss subtle changes that are still important, and they rely on training data that isn't very accurate or diverse. Basically, existing systems aren't reliable and are using outdated ways to test themselves.

What's the solution?

The researchers developed a new system called OmniShotCut. It uses a powerful type of artificial intelligence called a Transformer to analyze videos and predict shot boundaries. Instead of just looking at individual frames, it considers the relationships *between* different parts of the video to make more accurate decisions. To train this system without relying on flawed human labeling, they created a way to automatically generate realistic video transitions with perfect boundaries. They also created a new, more challenging set of videos to test their system.

Why it matters?

This work is important because more accurate shot boundary detection can improve many video-related tasks, like video editing, summarization, and search. By creating a better system and a better way to evaluate it, the researchers are pushing the field forward and making it possible to build more intelligent video processing tools.

Abstract

Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.