< Explain other AI papers

RoMath: A Mathematical Reasoning Benchmark in Romanian

Adrian Cosma, Ana-Maria Bucur, Emilian Radoi

2024-09-19

RoMath: A Mathematical Reasoning Benchmark in Romanian

Summary

This paper introduces RoMath, a benchmark designed to improve mathematical reasoning in the Romanian language by providing datasets that help train and evaluate language models.

What's the problem?

Most existing benchmarks for mathematical reasoning focus only on English, which limits the development of AI models that can understand and process mathematics in other languages, especially low-resource languages like Romanian. This lack of resources makes it difficult for non-English speakers to benefit from advancements in AI and machine learning.

What's the solution?

RoMath consists of three datasets: RoMath-Baccalaureate, RoMath-Competitions, and RoMath-Synthetic. These datasets cover various mathematical topics and difficulty levels, allowing researchers to train and evaluate models specifically for Romanian. By focusing on Romanian, RoMath addresses the gaps left by English-centric resources and provides valuable tools for improving language models in this language.

Why it matters?

This research is important because it promotes the development of AI technologies in underrepresented languages, helping to ensure that advancements in machine learning are accessible to a wider audience. By creating resources like RoMath, researchers can enhance AI's ability to understand mathematical concepts in Romanian, which can lead to better educational tools and applications for speakers of this language.

Abstract

Mathematics has long been conveyed through natural language, primarily for human understanding. With the rise of mechanized mathematics and proof assistants, there is a growing need to understand informal mathematical text, yet most existing benchmarks focus solely on English, overlooking other languages. This paper introduces RoMath, a Romanian mathematical reasoning benchmark suite comprising three datasets: RoMath-Baccalaureate, RoMath-Competitions and RoMath-Synthetic, which cover a range of mathematical domains and difficulty levels, aiming to improve non-English language models and promote multilingual AI development. By focusing on Romanian, a low-resource language with unique linguistic features, RoMath addresses the limitations of Anglo-centric models and emphasizes the need for dedicated resources beyond simple automatic translation. We benchmark several open-weight language models, highlighting the importance of creating resources for underrepresented languages. We make the code and dataset available.