MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

MiniMax, Aili Chen, Aonian Li, Bangwei Gong, Binyang Jiang, Bo Fei, Bo Yang, Boji Shan, Changqing Yu, Chao Wang, Cheng Zhu, Chengjun Xiao, Chengyu Du, Chi Zhang, Chu Qiao, Chunhao Zhang, Chunhui Du, Congchao Guo, Da Chen, Deming Ding, Dianjun Sun, Dong Li

2025-06-17

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention

Summary

This paper talks about MiniMax-M1, a new type of large-scale AI model that uses a hybrid attention system combined with a special 'lightning attention' mechanism and a Mixture-of-Experts design to efficiently handle extremely long text inputs. It can process up to 1 million tokens at once, which is way longer than most other models, and it uses smart techniques to do this with less computing power during testing.

What's the problem?

The problem is that AI models that reason and process language usually struggle when the input text is very long because attention—the process that helps the model focus on important parts—becomes very slow and needs a lot of computer resources. This makes it hard to train and use these models for complex tasks that require looking at a lot of information at the same time.

What's the solution?

The solution was to create MiniMax-M1 with a hybrid design that mixes traditional transformer attention blocks with new 'lightning attention' blocks that are much faster and more efficient. It also uses a Mixture-of-Experts architecture that activates only parts of the network needed for each token, saving resources. This setup allows the model to understand and reason over really long inputs while keeping computation costs low. The team also developed a new reinforcement learning method to train the model efficiently on complex problems, finishing training faster and cheaper than before.

Why it matters?

This matters because being able to quickly and efficiently handle very long inputs helps AI models work on harder and more detailed reasoning and real-world tasks, like software engineering or understanding large documents. By making these models more powerful and affordable to train and run, MiniMax-M1 pushes AI forward toward solving problems that require deep and extended thinking.

Abstract

A hybrid-attention reasoning model called MiniMax-M1, featuring a Mixture-of-Experts architecture and lightning attention mechanism, is introduced for efficient long-input processing and reinforcement learning.

View Paper