Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

StepFun, Bin Wang, Bojun Wang, Changyi Wan, Guanzhe Huang, Hanpeng Hu, Haonan Jia, Hao Nie, Mingliang Li, Nuo Chen, Siyu Chen, Song Yuan, Wuxun Xie, Xiaoniu Song, Xing Chen, Xingping Yang, Xuelin Zhang, Yanbo Yu, Yaoyu Wang, Yibo Zhu, Yimin Jiang, Yu Zhou

2025-07-31

Step-3 is Large yet Affordable: Model-system Co-design for
Cost-effective Decoding

Summary

This paper talks about Step-3, a very large AI model with 321 billion parameters, designed to be both powerful and affordable by optimizing how it processes and generates text.

What's the problem?

The problem is that decoding, which is the process where the model creates its output, is very expensive and slow for large models, especially when dealing with long pieces of text or complex reasoning, which limits practical use.

What's the solution?

Step-3 solves this by using a special model-system co-design that includes two new techniques: Multi-Matrix Factorization Attention, which cuts down memory and computation needed for attention mechanisms, and Attention-FFN Disaggregation, which separates parts of the model’s processing to run more efficiently. Together, these reduce decoding costs and speed up how fast the model can produce tokens.

Why it matters?

This matters because lowering the cost and increasing the speed of decoding means big AI models like Step-3 can be used more widely and effectively in real-world applications, allowing smarter AI to run faster and cheaper on available hardware.

Abstract

Step-3, a 321B-parameter VLM, optimizes decoding costs through Multi-Matrix Factorization Attention and Attention-FFN Disaggregation, achieving high efficiency and throughput compared to other models.

View Paper