MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Jihan Yao, Yushi Hu, Yujie Yi, Bin Han, Shangbin Feng, Guang Yang, Bingbing Wen, Ranjay Krishna, Lucy Lu Wang, Yulia Tsvetkov, Noah A. Smith, Banghua Zhu

2025-05-28

MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask
Multimodal Generation

Summary

This paper talks about MMMG, a big new test that checks how well AI models can handle lots of different tasks that involve working with both text, images, and sounds.

What's the problem?

The problem is that current AI models are being used for more and more complicated jobs, like creating stories from pictures or generating music from text, but there hasn't been a really thorough way to measure how good they are at all these different things, especially when it comes to matching what humans think is good.

What's the solution?

To solve this, the researchers created MMMG, which includes 49 different tasks and 937 instructions, covering a wide range of things AI can do with text, images, and audio. This new test is designed to better match human opinions and helps show where AI models still need to get better, especially in reasoning and creating audio.

Why it matters?

This is important because having a reliable way to test AI on many different types of tasks helps researchers and developers improve these models, making them more useful and trustworthy for everyone.

Abstract

MMMG is a comprehensive benchmark for multimodal generation, offering 49 tasks and 937 instructions to align automatic evaluation with human judgment, revealing areas for improvement in reasoning and audio generation.

View Paper