Magistral

Mistral-AI, Abhinav Rastogi, Albert Q. Jiang, Andy Lo, Gabrielle Berrada, Guillaume Lample, Jason Rute, Joep Barmentlo, Karmesh Yadav, Kartik Khandelwal, Khyathi Raghavi Chandu, Léonard Blier, Lucile Saulnier, Matthieu Dinot, Maxime Darrin, Neha Gupta, Roman Soletskyi, Sagar Vaze, Teven Le Scao, Yihan Wang, Adam Yang, Alexander H. Liu

2025-06-15

Summary

Magistral is a new method for training large language models using reinforcement learning (RL). It improves the model's ability to understand and follow instructions, including handling different types of input like text and images, without needing pre-existing training examples from other RL models.

What's the problem?

Normally, training large language models with reinforcement learning requires existing RL training data or models to guide the learning process, which limits exploring new possibilities and makes training complex and dependent on previous work.

What's the solution?

The researchers created a scalable RL training system from scratch that uses multiple processes working together to generate answers, check them, and update the model continuously without pausing. This system encourages exploration and improves the model’s reasoning and instruction-following skills purely from text data, using a special algorithm called Group Relative Policy Optimization.

Why it matters?

This matters because it shows that large language models can be enhanced using reinforcement learning without relying on previous RL data. This allows more freedom to improve and customize AI models for better understanding across multiple types of content, making them more useful and powerful for real-world tasks.

Abstract

Magistral, a scalable reinforcement learning pipeline, demonstrates that RL can enhance multimodal understanding and instruction following in large language models without requiring existing RL traces.

View Paper