MovieCORE: COgnitive REasoning in Movies
Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Yung-Hao Tang, Shang-Hong Lai, Winston H. Hsu
2025-08-27
Summary
This paper introduces a new dataset called MovieCORE, which is designed to test how well AI can *really* understand movies, going beyond just recognizing what's happening on screen.
What's the problem?
Current AI systems that answer questions about videos often only focus on simple, obvious details. They struggle with questions that require deeper thinking and understanding of the story, characters, and themes – what psychologists call 'System-2 thinking'. Existing datasets didn't push AI to demonstrate this more complex understanding, and current video-language models weren't performing well on these types of questions.
What's the solution?
The researchers created MovieCORE by using multiple AI language models to brainstorm and create challenging questions and answers about movies. They also developed tests to make sure the questions were actually thought-provoking and complex. To help existing AI models improve, they built a module called ACE that enhances the model's reasoning abilities *after* it's already been trained, boosting performance by up to 25%.
Why it matters?
This work is important because it pushes the field of AI closer to truly understanding movies like humans do. It highlights the weaknesses of current AI systems and provides a new benchmark for evaluating progress. By improving AI's ability to understand nuanced cinematic content, we can build more intelligent and helpful AI assistants and potentially unlock new ways to interact with and analyze films.
Abstract
This paper introduces MovieCORE, a novel video question answering (VQA) dataset designed to probe deeper cognitive understanding of movie content. Unlike existing datasets that focus on surface-level comprehension, MovieCORE emphasizes questions that engage System-2 thinking while remaining specific to the video material. We present an innovative agentic brainstorming approach, utilizing multiple large language models (LLMs) as thought agents to generate and refine high-quality question-answer pairs. To evaluate dataset quality, we develop a set of cognitive tests assessing depth, thought-provocation potential, and syntactic complexity. We also propose a comprehensive evaluation scheme for assessing VQA model performance on deeper cognitive tasks. To address the limitations of existing video-language models (VLMs), we introduce an agentic enhancement module, Agentic Choice Enhancement (ACE), which improves model reasoning capabilities post-training by up to 25%. Our work contributes to advancing movie understanding in AI systems and provides valuable insights into the capabilities and limitations of current VQA models when faced with more challenging, nuanced questions about cinematic content. Our project page, dataset and code can be found at https://joslefaure.github.io/assets/html/moviecore.html.