Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Xuemeng Huang, Fengzong Lian, Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Tao Yang
2024-11-05

Summary
This paper introduces Hunyuan-Large, a new open-source model developed by Tencent that uses a mixture of experts (MoE) architecture. It has 389 billion parameters, with 52 billion activated at any time, making it one of the largest models available for various tasks like language understanding and coding.
What's the problem?
As AI models grow in size and complexity, there’s a need for them to perform well across many different tasks without requiring excessive computational resources. Traditional models often struggle with efficiency and can be limited in their capabilities when handling long texts or complex reasoning tasks.
What's the solution?
Hunyuan-Large addresses these issues by using a mixture of experts approach, where only the most relevant parts of the model are activated based on the task at hand. This allows it to manage large amounts of data (up to 256,000 tokens) effectively while maintaining high performance. The model was trained on a massive dataset that includes high-quality synthetic data, which helps it learn better. It also uses advanced techniques like key-value cache compression and expert-specific learning rates to optimize its performance.
Why it matters?
This research is significant because Hunyuan-Large sets a new standard for open-source AI models, demonstrating that it's possible to achieve high accuracy and efficiency simultaneously. By outperforming other well-known models like LLama3.1-70B and matching larger models in key areas, Hunyuan-Large can be a valuable tool for researchers and developers working in AI, making advanced capabilities more accessible for various applications.
Abstract
In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large