Trillion 7B Technical Report

Sungjun Han, Juyoung Suk, Suyeong An, Hyungguk Kim, Kyuseok Kim, Wonsuk Yang, Seungtaek Choi, Jamin Shin

2025-04-24

Summary

This paper talks about Trillion-7B, a new language model that can understand and work with many different languages, even when it hasn't seen a lot of training data in those languages.

What's the problem?

The problem is that most language models need a huge amount of text in every language to work well, which is tough for less common languages that don't have as much data available. This makes it hard for these models to be truly multilingual and helpful for everyone.

What's the solution?

The researchers developed a technique called Cross-lingual Document Attention (XLDA), which helps the model learn from documents in one language and apply that knowledge to others. This means Trillion-7B can perform well in many languages without needing tons of training examples for each one.

Why it matters?

This matters because it makes advanced language technology more accessible to speakers of less common languages, helping bridge language gaps and making AI tools more fair and useful around the world.

Abstract

Trillion-7B is a highly efficient multilingual LLM leveraging Cross-lingual Document Attention (XLDA) for knowledge transfer and achieving competitive performance with minimal multilingual training data.

View Paper