VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos

Tingyu Song, Tongyan Hu, Guo Gan, Yilun Zhao

2025-05-30

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC
Videos

Summary

This paper talks about VF-Eval, a new way to test how well AI models that work with both language and visuals can watch and give feedback on videos made by other AI systems.

What's the problem?

The problem is that as more videos are being created by AI, it's hard to know if these videos are good, make sense, or are actually useful, especially since regular AI models aren't always great at understanding or judging video content.

What's the solution?

The researchers created a special benchmark called VF-Eval that tests AI models on four different tasks related to understanding and giving feedback on AI-generated videos. They also showed that when these models are trained to align their feedback with what humans think, the quality of video generation improves.

Why it matters?

This is important because it helps make sure that AI-generated videos are better and more reliable. It also means that as AI gets better at checking its own work, we can trust the content it creates more, which is useful for everything from entertainment to education.

Abstract

A new benchmark, VF-Eval, evaluates the capabilities of MLLMs in interpreting AI-generated content videos across four tasks, highlighting challenges and demonstrating benefits in video generation through human feedback alignment.

View Paper