Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models
Noam Tsfaty, Avishai Weizman, Liav Cohen, Moshe Tshuva, Yehudit Aperstein
2025-12-01
Summary
This paper focuses on automatically finding unusual events in surveillance footage, like crimes or accidents, but with a limited amount of information provided to the system.
What's the problem?
Normally, to teach a computer to recognize anomalies in videos, you need to specifically label *what* is abnormal in each frame. This is time-consuming and difficult. The problem this paper tackles is how to identify these rare, strange events in videos when you only tell the system whether the *entire video* contains an anomaly, not where or what it is.
What's the solution?
The researchers created a system with two main parts: one that analyzes the video using traditional computer vision techniques focusing on what things *look* like, and another that uses a newer technique called transformers, which are good at understanding relationships between different parts of the video over time. They then combined the information from both parts using a method called 'top-k pooling' to highlight the most important features. This system was tested on a dataset of crime videos and achieved a high accuracy score of 90.7%.
Why it matters?
This research is important because it makes anomaly detection in surveillance videos much more practical. By reducing the need for detailed labeling, it becomes easier and cheaper to deploy these systems for security and public safety, potentially helping to automatically flag suspicious activity.
Abstract
We address the challenge of detecting rare and diverse anomalies in surveillance videos using only video-level supervision. Our dual-backbone framework combines convolutional and transformer representations through top-k pooling, achieving 90.7% area under the curve (AUC) on the UCF-Crime dataset.