SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System

Zhiyu Xu, Weilong Yan, Yufei Shi, Xin Meng, Tao He, Huiping Zhuang, Ming Li, Hehe Fan

2025-11-26

SciEducator: Scientific Video Understanding and Educating via Deming-Cycle Multi-Agent System

Summary

This paper introduces SciEducator, a new system designed to understand and teach scientific concepts from videos. It's built to be better than current AI models at handling the complex reasoning and specialized knowledge needed for science education.

What's the problem?

Current AI models, even advanced ones like those that can process both images and text, struggle with understanding science videos. This is because science requires specific background knowledge and a careful, step-by-step way of thinking that these models often lack. They aren't good at explaining *how* something works scientifically, just what's happening visually.

What's the solution?

The researchers created SciEducator, which uses a system of multiple AI 'agents' that work together and constantly improve themselves. It's based on a management idea called the Deming Cycle – Plan, Do, Study, Act – where the system plans an explanation, tries it out, checks if it's correct, and then adjusts its approach. SciEducator can then create different types of learning materials, like text, images, and audio, to help explain the scientific process shown in the video. They also created a new set of 500 science questions to test how well the system performs.

Why it matters?

This work is important because it pushes AI closer to being a truly helpful tool for science education. SciEducator significantly outperforms existing AI models on understanding and explaining science videos, opening the door for AI-powered tutoring and learning resources that can actually teach complex scientific concepts effectively.

Abstract

Recent advancements in multimodal large language models (MLLMs) and video agent systems have significantly improved general video understanding. However, when applied to scientific video understanding and educating, a domain that demands external professional knowledge integration and rigorous step-wise reasoning, existing approaches often struggle. To bridge this gap, we propose SciEducator, the first iterative self-evolving multi-agent system for scientific video comprehension and education. Rooted in the classical Deming Cycle from management science, our design reformulates its Plan-Do-Study-Act philosophy into a self-evolving reasoning and feedback mechanism, which facilitates the interpretation of intricate scientific activities in videos. Moreover, SciEducator can produce multimodal educational content tailored to specific scientific processes, including textual instructions, visual guides, audio narrations, and interactive references. To support evaluation, we construct SciVBench, a benchmark consisting of 500 expert-verified and literature-grounded science QA pairs across five categories, covering physical, chemical, and everyday phenomena. Extensive experiments demonstrate that SciEducator substantially outperforms leading closed-source MLLMs (e.g., Gemini, GPT-4o) and state-of-the-art video agents on the benchmark, establishing a new paradigm for the community.

View Paper