SpookyBench is a benchmark for temporal pattern recognition in videos that highlights the limitations of vision-language models in processing noise-like frames without spatial information.

This paper talks about SpookyBench, a new test designed to see how well AI models that understand both video and language can recognize patterns over time in videos, especially when the frames look random and don't have clear shapes or objects.

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract