Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Xiaochuan Li, Zichun Yu, Chenyan Xiong

2024-10-21

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Summary

This paper presents Montessori-Instruct, a new framework designed to create effective training data tailored specifically for how students learn, improving the training of language models.

What's the problem?

Synthetic data is often used to train language models, but it can be noisy and misleading, which makes it hard for the models to learn effectively. This can lead to poor performance because the generated data may not align well with what students actually need to learn. The challenge is to create training data that truly supports student learning preferences without introducing confusion.

What's the solution?

To solve this problem, the authors developed Montessori-Instruct, which focuses on understanding how individual students learn. They used a method called Direct Preference Optimization (DPO) to train a 'teacher' model that generates synthetic data based on the specific learning preferences of a 'student' model. By analyzing how different types of synthetic data influence student learning, they were able to create more relevant and effective training materials. Their experiments showed that this new approach significantly outperformed traditional methods in generating useful training data.

Why it matters?

This research is important because it enhances the way we train AI language models, making them more effective at understanding and generating human-like text. By tailoring the training data to fit student needs, Montessori-Instruct could lead to better educational tools and resources, ultimately improving learning outcomes in various fields such as education and technology.

Abstract

Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

View Paper