Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai

2024-07-16

Summary

This paper introduces Qwen2, a new series of large language models that improve upon previous models by offering better performance and versatility across various tasks, including language understanding and generation.

What's the problem?

As AI language models evolve, there is a need for models that not only perform well in understanding and generating text but also handle a wide range of languages and tasks effectively. Previous models often struggled with these aspects, leading to limitations in their usability and effectiveness.

What's the solution?

The Qwen2 series includes multiple models ranging from 0.5 billion to 72 billion parameters, allowing for flexibility in application. It features both dense models and a Mixture-of-Experts model, which can optimize performance based on the task at hand. The flagship model, Qwen2-72B, has demonstrated exceptional performance on various benchmarks, outperforming many other open-source and proprietary models. Additionally, Qwen2 supports around 30 languages, enhancing its usability globally. The model weights are made available on platforms like Hugging Face to encourage community use and innovation.

Why it matters?

This research is significant because it pushes the boundaries of what AI language models can do, making them more accessible and effective for a variety of applications. By improving multilingual capabilities and overall performance, Qwen2 can be used in diverse fields such as education, customer service, and content creation, ultimately enhancing how people interact with technology.

Abstract

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, and exhibits competitive performance relative to proprietary models across diverse benchmarks on language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, showcases remarkable performance: 84.2 on MMLU, 37.9 on GPQA, 64.6 on HumanEval, 89.5 on GSM8K, and 82.4 on BBH as a base language model. The instruction-tuned variant, Qwen2-72B-Instruct, attains 9.1 on MT-Bench, 48.1 on Arena-Hard, and 35.7 on LiveCodeBench. Moreover, Qwen2 demonstrates robust multilingual capabilities, proficient in approximately 30 languages, spanning English, Chinese, Spanish, French, German, Arabic, Russian, Korean, Japanese, Thai, Vietnamese, and more, underscoring its versatility and global reach. To foster community innovation and accessibility, we have made the Qwen2 model weights openly available on Hugging Face1 and ModelScope2, and the supplementary materials including example code on GitHub3. These platforms also include resources for quantization, fine-tuning, and deployment, facilitating a wide range of applications and research endeavors.

View Paper