TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen

2025-02-27

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem
Understanding

Summary

This paper talks about TheoremExplainAgent, an AI system that creates detailed videos to explain mathematical and scientific theorems using animations and text, making complex ideas easier to understand.

What's the problem?

AI models are good at explaining theorems with text, but they struggle to create visual explanations that help people fully understand the reasoning behind the concepts. This makes it harder to spot deeper flaws in the AI's understanding of theorems.

What's the solution?

The researchers developed TheoremExplainAgent, which uses two AI agents: one plans the explanation and writes a story, while the other creates animations using Python. They also introduced a benchmark called TheoremExplainBench to test how well the AI explains 240 different theorems from subjects like math, physics, chemistry, and computer science. Their system produced long videos that revealed reasoning flaws more clearly than text-based explanations.

Why it matters?

This matters because visual explanations make it easier for people to understand complex ideas and identify mistakes in reasoning. It helps improve AI systems by showing where they need better logic, and it could be useful in education by making STEM topics more accessible and engaging for students.

Abstract

Understanding domain-specific theorems often requires more than just text-based reasoning; effective communication through structured visual explanations is crucial for deeper comprehension. While large language models (LLMs) demonstrate strong performance in text-based theorem reasoning, their ability to generate coherent and pedagogically meaningful visual explanations remains an open challenge. In this work, we introduce TheoremExplainAgent, an agentic approach for generating long-form theorem explanation videos (over 5 minutes) using Manim animations. To systematically evaluate multimodal theorem explanations, we propose TheoremExplainBench, a benchmark covering 240 theorems across multiple STEM disciplines, along with 5 automated evaluation metrics. Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a success rate of 93.8% and an overall score of 0.77. However, our quantitative and qualitative studies show that most of the videos produced exhibit minor issues with visual element layout. Furthermore, multimodal explanations expose deeper reasoning flaws that text-based explanations fail to reveal, highlighting the importance of multimodal explanations.

View Paper