InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Bo-Wen Zhang, Yan Yan, Lin Li, Guang Liu

2024-08-15

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning

Summary

This paper introduces InfinityMATH, a new dataset designed to help improve how language models understand and solve mathematical problems without needing a lot of specific numerical data.

What's the problem?

Creating large datasets for training language models to reason mathematically can be very difficult and expensive. Existing methods often rely on a lot of specific examples and require significant computing power, making it hard to scale up the process for broader use.

What's the solution?

InfinityMATH addresses these challenges by allowing the creation of mathematical problems that are not tied to specific numbers. This means that the models can learn to solve problems more flexibly and efficiently. The authors tested their dataset with popular language models like Llama2 and CodeLlama, showing that these models improved significantly in their ability to solve both familiar and new types of math problems after being fine-tuned on InfinityMATH.

Why it matters?

This research is important because it makes it easier and cheaper to train language models on mathematical reasoning, which can help develop better AI tools for education, research, and various applications in science and technology. By providing this dataset to the public, the authors encourage further research and innovation in the field.

Abstract

Recent advancements in Chain-of-Thoughts (CoT) and Program-of-Thoughts (PoT) methods have greatly enhanced language models' mathematical reasoning capabilities, facilitating their integration into instruction tuning datasets with LLMs. However, existing methods for large-scale dataset creation require substantial seed data and high computational costs for data synthesis, posing significant challenges for scalability. We introduce InfinityMATH, a scalable instruction tuning dataset for programmatic mathematical reasoning. The construction pipeline emphasizes decoupling numbers from mathematical problems to synthesize number-independent programs, enabling efficient and flexible scaling while minimizing dependency on specific numerical values. Fine-tuning experiments with open-source language and code models, such as Llama2 and CodeLlama, demonstrate the practical benefits of InfinityMATH. These fine-tuned models, showed significant relative improvements on both in-domain and out-of-domain benchmarks, ranging from 184.7% to 514.3% on average. Additionally, these models exhibited high robustness on the GSM8K+ and MATH+ benchmarks, which are enhanced version of test sets with simply the number variations. InfinityMATH ensures that models are more versatile and effective across a broader range of mathematical problems. The data is available at https://huggingface.co/datasets/flagopen/InfinityMATH.

View Paper