FinMTEB: Finance Massive Text Embedding Benchmark

Yixuan Tang, Yi Yang

2025-02-19

FinMTEB: Finance Massive Text Embedding Benchmark

Summary

This paper talks about FinMTEB, a new way to test how well AI models understand and work with financial texts. It's like creating a special exam for AI that focuses on finance topics, using real financial documents in both English and Chinese.

What's the problem?

Current AI models are good at understanding general language, but they struggle with specialized financial texts. It's like having a student who's great at general English but gets confused when reading complex financial reports. There wasn't a good way to test how well AI models handle these financial texts, which is important for real-world finance applications.

What's the solution?

The researchers created FinMTEB, which includes 64 different tests covering seven types of financial tasks. They also made a special AI model called FinPersona-E5 that's trained specifically on financial language. They then tested 15 different AI models, including their new one, to see how well they performed on these financial tasks.

Why it matters?

This matters because it helps us create better AI tools for the finance world. By showing that AI models trained specifically for finance do better than general models, it encourages the development of more specialized AI. This could lead to more accurate and efficient AI systems for tasks like analyzing financial news, understanding company reports, and making sense of complex financial data. Surprisingly, they also found that sometimes simpler methods work better for certain tasks, which could change how we approach AI in finance.

Abstract

Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in large language models (LLMs) have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, FinPersona-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including FinPersona-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

View Paper