The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru

2024-08-01

Summary

This paper discusses Llama 3, a new set of advanced language models designed to handle multiple languages, coding tasks, reasoning, and tool usage. It includes a large model with 405 billion parameters that can process long texts and perform various tasks effectively.

What's the problem?

While many language models have made great strides in popular languages, there is still a gap in performance for models that support multiple languages and complex tasks. Existing models often struggle with efficiency and may not fully understand the nuances of different languages, making them less effective for users who need accurate and reliable outputs.

What's the solution?

The authors introduce Llama 3 as a solution to these challenges. This new set of models includes a dense Transformer with 405 billion parameters and can handle up to 128,000 tokens at once. The paper details extensive testing that shows Llama 3 performs comparably to leading models like GPT-4 across various tasks. Additionally, Llama 3 integrates capabilities for processing images, videos, and speech, making it versatile for different applications. The authors also release pre-trained versions of the model to support future research.

Why it matters?

This research is significant because it pushes the boundaries of what language models can do, especially in multilingual contexts. By enhancing the capabilities of Llama 3, it opens up new possibilities for AI applications in education, business, and creative industries. This model could help improve communication and understanding across different languages and cultures, making technology more accessible to a broader audience.

Abstract

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

View Paper