MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
Xueqing Peng, Lingfei Qian, Yan Wang, Ruoyu Xiang, Yueru He, Yang Ren, Mingyang Jiang, Jeff Zhao, Huan He, Yi Han, Yun Feng, Yuechen Jiang, Yupeng Cao, Haohang Li, Yangyang Yu, Xiaoyu Wang, Penglei Gao, Shengyuan Lin, Keyi Wang, Shanshan Yang, Yilun Zhao, Zhiwei Liu
2025-06-18
Summary
This paper talks about MultiFinBen, a new benchmark created to test large language models on financial tasks using multiple languages and types of input like text, images, and audio. It looks at how well these models can handle real-world financial information in different languages and formats.
What's the problem?
The problem is that previous tests for financial language models only focused on one language or type of input and often used simple tasks. This didn't show how well models can handle the complicated and mixed ways financial information appears in the real world.
What's the solution?
To fix this, the researchers created MultiFinBen, which includes many tasks across several languages and modes, such as reading financial documents with pictures and recognizing text in images. They also designed a system to measure difficulties of tasks and picked challenges that really show where models struggle the most. They tested 22 top models and found even the best ones have a hard time with tough, mixed-language financial tasks.
Why it matters?
This matters because it gives a much better way to evaluate and improve AI models in the financial world, helping to build smarter tools that understand complex financial information from all over the world, in different languages and forms.
Abstract
MultiFinBen is a multilingual and multimodal benchmark for financial domain tasks, evaluating LLMs across modalities and linguistic settings, revealing challenges in complex cross-lingual and multimodal financial reasoning.