Inverse Bridge Matching Distillation
Nikita Gushchin, David Li, Daniil Selikhanovych, Evgeny Burnaev, Dmitry Baranchuk, Alexander Korotin
2025-02-05

Summary
This paper talks about ACECODER, a new method that uses reinforcement learning (RL) and automated test-case generation to improve how AI models write and debug code. It focuses on making coder models smarter and more efficient by teaching them through tests.
What's the problem?
Most AI coder models rely on a method called supervised fine-tuning, but this approach doesn’t fully explore the potential of reinforcement learning because there isn’t enough reliable data to reward the models for good coding. This limits their ability to learn effectively and perform better in complex coding tasks.
What's the solution?
The researchers developed ACECODER, which creates large-scale automated test cases from existing code data. These test cases help train reward models using a technique called Bradley-Terry loss, allowing coder models to learn better through reinforcement learning. They tested this method on various coding benchmarks and showed significant improvements in accuracy, with some smaller models performing as well as much larger ones. They also demonstrated that their RL approach could improve performance in just 80 optimization steps.
Why it matters?
This research is important because it shows how reinforcement learning can make AI coder models more powerful and efficient. By using automated test cases, ACECODER helps these models write better code and debug more effectively, which could lead to faster software development and fewer coding errors in real-world applications.
Abstract
Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. Unlike previously developed DBM distillation techniques, the proposed method can distill both conditional and unconditional types of DBMs, distill models in a one-step generator, and use only the corrupted images for training. We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks, and show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than used teacher model depending on particular setup.