NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Jari Kolehmainen, Nikolay Blagoev, John Donaghy, Oğuzhan Ersoy, Christopher Nies

2025-06-15

NoLoCo: No-all-reduce Low Communication Training Method for Large Models

Summary

This paper talks about NoLoCo, a new training method for large language models that drastically reduces the need for heavy communication between many computers working together. Instead of having all computers constantly check in with each other, NoLoCo lets them share information in smaller, random pairs, making training faster and more efficient.

What's the problem?

The problem is that training very big AI models needs many computers to work together and constantly exchange information. This communication takes a lot of time and slows everything down, especially when the network between computers is slow or has delays.

What's the solution?

The solution was to create NoLoCo, which avoids the usual global communication step by letting computers update their parts locally for a short time, then share and average their work with just one random partner at a time. It uses a special technique in its optimizer that keeps the different parts from drifting apart too much, while also mixing data through random routing between processing stages. This reduces waiting times and speeds up training.

Why it matters?

This matters because it allows training very large AI models more quickly and cheaply without needing super-fast networks, making advanced AI development more accessible and scalable. It helps save time and computing resources while still achieving good learning performance.

Abstract

NoLoCo is a novel optimization method that eliminates explicit parameter synchronization and reduces communication overhead during the training of large language models, achieving faster convergence rates and reduced idling time compared to existing methods.

View Paper