MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

Shoubin Yu, Yue Zhang, Ziyang Wang, Jaehong Yoon, Mohit Bansal

2025-06-23

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert
Aggregation

Summary

This paper talks about MEXA, a new system that combines the strengths of different expert AI models to improve how machines reason using multiple types of information like images, text, and more, without needing to be trained again.

What's the problem?

The problem is that AI models that are experts in different areas—like understanding images, text, or other data—usually work separately, and combining their knowledge effectively for reasoning is challenging and often requires extra training.

What's the solution?

The researchers created MEXA, a framework that uses a big reasoning model to gather and combine the outputs of various specialized expert models dynamically for each task, allowing the system to reason across different types of data without additional training.

Why it matters?

This matters because it makes AI systems more flexible and smarter at solving complex problems involving multiple types of information, making them useful in many fields like healthcare, robotics, and scientific research.

Abstract

MEXA is a training-free framework that aggregates outputs from specialized expert models using a Large Reasoning Model for effective multimodal reasoning across various domains.

View Paper