MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Xingxuan Li, Yao Xiao, Dianwen Ng, Hai Ye, Yue Deng, Xiang Lin, Bin Wang, Zhanfeng Mo, Chong Zhang, Yueyi Zhang, Zonglin Yang, Ruilin Li, Lei Lei, Shihao Xu, Han Zhao, Weiling Chen, Feng Ji, Lidong Bing

2025-07-22

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization

Summary

This paper talks about MiroMind-M1, a fully open-source AI model designed to solve complex math problems using advanced techniques that allow it to think step-by-step like a human.

What's the problem?

The problem is that many AI models struggle with mathematical reasoning because they often try to solve problems in a simple, straight line without understanding the full context or breaking down the steps carefully.

What's the solution?

The authors created MiroMind-M1 using a two-stage training approach: first, the model learned from a huge number of math problems with detailed step-by-step solutions, and then it was further trained with a special reinforcement learning method that rewards correct answers and encourages the model to avoid repeating itself. This method helps the model adapt its reasoning based on the context and improves its accuracy.

Why it matters?

This matters because it brings open-source math reasoning AI closer to the level of closed-source models, making powerful tools available to more researchers and developers, and it shows progress toward AI that can handle complex, logical thinking similar to humans.

Abstract

The MiroMind-M1 series, built on Qwen-2.5, introduces fully open-source reasoning language models with state-of-the-art performance in mathematical reasoning through Context-Aware Multi-Stage Policy Optimization.

View Paper