MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

Sara Papi, Maike Züfle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues

2025-08-04

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from
Scientific Talks

Summary

This paper talks about MCIF, a new benchmark that tests how well AI models follow instructions across different languages and types of information, like speech, video, and text, using scientific talks.

What's the problem?

The problem is that most current tests for AI models focus only on one language or one type of information, like just text or just speech, and they often only look at short pieces of information. This makes it hard to know how well AI can handle more complex, real-world tasks that involve multiple languages and mixed types of data over longer conversations or presentations.

What's the solution?

MCIF solves this by providing a large, carefully created dataset with humans checking the quality, covering four languages and three types of information. It includes many tasks like translating, summarizing, and answering questions based on both short and long inputs, which helps test AI models more thoroughly in realistic scientific settings.

Why it matters?

This matters because it pushes AI to become better at understanding and following instructions in different languages and formats, which is important for making smarter and more versatile AI assistants that can help people around the world in areas like science and technology.

Abstract

MCIF is a multilingual, human-annotated benchmark for evaluating instruction-following in crosslingual, multimodal settings using scientific talks.

View Paper