Paris: A Decentralized Trained Open-Weight Diffusion Model

Zhiying Jiang, Raihan Seraj, Marcos Villagra, Bidhan Roy

2025-10-07

Paris: A Decentralized Trained Open-Weight Diffusion Model

Summary

This paper introduces Paris, a new system for creating images from text descriptions. What's special about Paris is that it was built using a network of computers working independently, rather than relying on a huge, centralized supercomputer.

What's the problem?

Traditionally, creating high-quality images from text using diffusion models requires massive amounts of computing power and a dedicated cluster of specialized GPUs. This makes it expensive and limits who can develop and use these models. Existing attempts at decentralized training haven't achieved the same quality as centralized methods, or still needed a lot of resources.

What's the solution?

The researchers developed a new approach called Distributed Diffusion Training. Instead of all the computers working on the same part of the image at the same time, they divided the task into smaller pieces and assigned each piece to a different 'expert' model. These experts were trained completely separately, without sharing information during training. A 'router' then intelligently chooses which expert to use when generating an image, resulting in high-quality results. This method avoids the need for constant communication and synchronization between computers, allowing it to run on a wider range of hardware.

Why it matters?

Paris demonstrates that it's possible to build powerful image generation models without needing a massive, centralized computing infrastructure. This opens up the field to more researchers and developers, and makes it more accessible. It also shows that decentralized training can be just as effective as traditional methods, while using significantly less data and computing power.

Abstract

We present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved without centrally coordinated infrastructure. Paris is open for research and commercial use. Paris required implementing our Distributed Diffusion Training framework from scratch. The model consists of 8 expert diffusion models (129M-605M parameters each) trained in complete isolation with no gradient, parameter, or intermediate activation synchronization. Rather than requiring synchronized gradient updates across thousands of GPUs, we partition data into semantically coherent clusters where each expert independently optimizes its subset while collectively approximating the full distribution. A lightweight transformer router dynamically selects appropriate experts at inference, achieving generation quality comparable to centrally coordinated baselines. Eliminating synchronization enables training on heterogeneous hardware without specialized interconnects. Empirical validation confirms that Paris's decentralized training maintains generation quality while removing the dedicated GPU cluster requirement for large-scale diffusion models. Paris achieves this using 14times less training data and 16times less compute than the prior decentralized baseline.

View Paper