The Massive Legal Embedding Benchmark (MLEB)

Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec

2025-10-24

The Massive Legal Embedding Benchmark (MLEB)

Summary

This paper introduces a new, large collection of legal documents and questions called the Massive Legal Embedding Benchmark, or MLEB, designed to test how well computers can understand and work with legal information.

What's the problem?

Currently, there wasn't a single, comprehensive resource available for researchers to accurately evaluate how well AI systems perform on various legal tasks across different countries and types of legal materials. Existing benchmarks were limited in scope, focusing on only certain areas of law or specific countries, making it hard to get a complete picture of a system's abilities.

What's the solution?

The researchers created MLEB, which includes ten different datasets covering laws and cases from the US, UK, EU, Australia, Ireland, and Singapore. These datasets include different kinds of legal documents like court cases, laws themselves, and contracts, and test different skills like finding relevant information, categorizing documents, and answering questions about the law. They even built seven of these datasets from scratch to fill in gaps where resources didn't exist, and they're making everything – the data, the code, and their results – publicly available.

Why it matters?

This benchmark is important because it provides a standardized way to measure and compare the performance of AI systems in the legal field. This will help drive improvements in legal technology, potentially leading to tools that can assist lawyers, make legal information more accessible, and improve the overall efficiency of the legal system.

Abstract

We present the Massive Legal Embedding Benchmark (MLEB), the largest, most diverse, and most comprehensive open-source benchmark for legal information retrieval to date. MLEB consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering). Seven of the datasets in MLEB were newly constructed in order to fill domain and jurisdictional gaps in the open-source legal information retrieval landscape. We document our methodology in building MLEB and creating the new constituent datasets, and release our code, results, and data openly to assist with reproducible evaluations.

View Paper