From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

Seokhee Hong, Sunkyoung Kim, Guijin Son, Soyeon Kim, Yeonjung Hong, Jinsik Lee

2025-07-15

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for
LLM Evaluation

Summary

This paper talks about KMMLU-Redux and KMMLU-Pro, which are two new Korean benchmarks designed to test how well large language models understand and perform in real Korean industrial and professional contexts.

What's the problem?

The problem is that existing benchmarks for testing language models often miss the specific knowledge needed for Korean industries and professional jobs. They either contain errors or rely on translations from other languages, which do not capture local laws, culture, or specialized knowledge well.

What's the solution?

The authors created KMMLU-Redux by fixing errors from the earlier KMMLU dataset to make it more reliable. They also made KMMLU-Pro, a new challenging benchmark using official Korean professional licensing exam questions from various fields. These benchmarks are carefully cleaned and checked for accuracy and updated regularly with recent exam content.

Why it matters?

This matters because it provides a way to accurately evaluate AI language models on real-world Korean tasks, including specialized professional knowledge. It helps AI developers see where their models succeed or struggle in Korea-specific areas, pushing better and more practical AI development tailored to local needs.

Abstract

Two Korean expert-level benchmarks, KMMLU-Redux and KMMLU-Pro, are introduced to evaluate Large Language Models in industrial and professional contexts.

View Paper