Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs

Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, Dong Li, Can Chen, Dandan Tu, Yin Li, Fisher Yu, Ruiming Tang, Yunhe Wang, Baojun Wang, Bin Wang, Bo Wang, Boxiao Liu, Changzheng Zhang

2025-04-14

Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend
NPUs

Summary

This paper talks about Pangu Ultra, a super powerful language model with 135 billion parameters that sets new records for performance. The researchers used special training techniques and ran the model on advanced computer hardware called Ascend NPUs to make it even faster and more efficient.

What's the problem?

The problem is that as language models get bigger and more complex, they become harder to train and require a lot of computing power. This can make them slow, expensive, and difficult to use for many people. Also, keeping these huge models stable and accurate during training is a big challenge.

What's the solution?

To solve this, the team used a method called depth-scaled sandwich normalization, which helps keep the model stable and learning well as it gets deeper and larger. They also took advantage of Ascend NPUs, which are special processors designed to handle the heavy lifting needed for training giant AI models efficiently. This combination allowed Pangu Ultra to reach top performance without wasting resources.

Why it matters?

This work matters because it shows how we can build and train extremely large and powerful language models more efficiently. With these improvements, advanced AI models like Pangu Ultra can become more accessible, reliable, and useful for things like language understanding, writing, and even scientific research.

Abstract

Pangu Ultra, a large language model with 135 billion parameters, achieves state-of-the-art performance using depth-scaled sandwich normalization and efficient training on Ascend NPUs.

View Paper