Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Junhao Cheng, Yuying Ge, Teng Wang, Yixiao Ge, Jing Liao, Ying Shan

2025-05-28

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Summary

This paper talks about a new test called Video-Holmes that checks if advanced AI models can solve mysteries in videos, kind of like how Sherlock Holmes would, by watching suspenseful short films.

What's the problem?

The problem is that while AI models can understand simple things in videos, they really struggle when it comes to putting together lots of clues and information to solve complicated problems, especially compared to how well humans can do it.

What's the solution?

The researchers created the Video-Holmes benchmark, which uses suspense films to see how well these AI models can gather clues, connect details, and reason through complex situations, and they found that the models have a tough time matching human-level thinking.

Why it matters?

This matters because if we want AI to help us with real-world problems that involve understanding and reasoning about videos, it needs to get much better at connecting information and thinking things through, just like a detective would.

Abstract

Video-Holmes benchmark evaluates complex video reasoning capabilities of MLLMs using suspense short films and reveals significant challenges in information integration compared to human experts.

View Paper