MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Xiaomi LLM-Core Team, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang

2025-05-13

MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining

Summary

This paper talks about MiMo-7B, a language model designed to be really good at reasoning, like solving math and programming problems, even though it's smaller than some other big models.

What's the problem?

The problem is that many large language models struggle with complex reasoning tasks, especially in areas like math and coding, and making them better usually means making them much bigger, which uses more resources.

What's the solution?

The researchers improved MiMo-7B by using special training tricks, like mixing different types of data and teaching the model to predict more than one word at a time during pre-training. After that, they used reinforcement learning focused on math and programming problems to make the model even smarter. As a result, MiMo-7B outperforms some larger models on these tough reasoning tasks.

Why it matters?

This matters because it shows that you don't always need the biggest model to get the best results. Smarter training can make smaller models powerful, which saves time, money, and energy while still solving complicated problems.

Abstract

MiMo-7B, a large language model optimized for reasoning tasks, enhances pre-training with data mixing and Multi-Token Prediction, and post-training with reinforcement learning on math and programming problems, achieving superior performance over larger models.

View Paper