Black-Box On-Policy Distillation of Large Language Models

Tianzhu Ye, Li Dong, Zewen Chi, Xun Wu, Shaohan Huang, Furu Wei

2025-11-14

Black-Box On-Policy Distillation of Large Language Models

Summary

This paper introduces a new method called Generative Adversarial Distillation, or GAD, for creating smaller, more manageable large language models (LLMs) by having them learn from larger, more powerful ones without needing access to the inner workings of the original model.

What's the problem?

Currently, when you want to create a smaller LLM that mimics a bigger one, you usually only have access to the text the bigger model *produces*, not how it *thinks* to produce that text. This makes it hard to get the smaller model to perform as well as the original. Existing methods for this 'black-box' learning aren't always very effective and can be unstable during training.

What's the solution?

GAD works by setting up a kind of competition between the student LLM (the one being trained) and a 'discriminator'. The student tries to generate text that looks like it came from the teacher LLM, and the discriminator tries to tell the difference. This process is similar to a game where the student gets better at fooling the discriminator, and the discriminator gets better at spotting fakes. The discriminator provides feedback to the student, helping it improve its responses in a way that’s constantly adapting and stable.

Why it matters?

This research is important because it provides a more effective way to create smaller LLMs that can perform almost as well as much larger, proprietary models. This is significant because smaller models are cheaper to run and more accessible, potentially allowing more people to use powerful AI technology. The results show GAD can create a student model comparable to its teacher, even one as advanced as GPT-5-Chat.

Abstract

Black-box distillation creates student large language models (LLMs) by learning from a proprietary teacher model's text outputs alone, without access to its internal logits or parameters. In this work, we introduce Generative Adversarial Distillation (GAD), which enables on-policy and black-box distillation. GAD frames the student LLM as a generator and trains a discriminator to distinguish its responses from the teacher LLM's, creating a minimax game. The discriminator acts as an on-policy reward model that co-evolves with the student, providing stable, adaptive feedback. Experimental results show that GAD consistently surpasses the commonly used sequence-level knowledge distillation. In particular, Qwen2.5-14B-Instruct (student) trained with GAD becomes comparable to its teacher, GPT-5-Chat, on the LMSYS-Chat automatic evaluation. The results establish GAD as a promising and effective paradigm for black-box LLM distillation.

View Paper