Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

Sai Niranjan Ramachandran, Suvrit Sra

2026-05-04

Trees to Flows and Back: Unifying Decision Trees and Diffusion Models

Summary

This research connects two seemingly different types of machine learning models: decision trees and diffusion models. Decision trees make choices in a step-by-step, organized way, while diffusion models create things by gradually adding noise and then removing it. The paper shows they're actually mathematically related under certain conditions.

What's the problem?

Traditionally, decision trees and diffusion models are treated as completely separate approaches to solving problems. This makes it hard to understand if there are underlying principles they both share, or if techniques from one could be useful for the other. Specifically, it was unclear if there was a unified way to understand how these models learn and improve.

What's the solution?

The researchers found a mathematical link between how decision trees are built and how diffusion models work. They discovered a common goal for both – something called 'Global Trajectory Score Matching' – and showed that a technique called 'gradient boosting' is a really good way to achieve this goal. They then used this understanding to create two new tools: 'treeflow' which makes better and faster predictions on data organized in tables, and 'dsmtree' which can teach a neural network to think like a decision tree.

Why it matters?

This work is important because it provides a deeper understanding of machine learning. By showing the connection between decision trees and diffusion models, it opens up possibilities for combining the strengths of both approaches. The new tools they created, 'treeflow' and 'dsmtree', demonstrate practical benefits, like faster and more accurate predictions, and a way to make simpler, more interpretable models using the power of neural networks.

Abstract

Decision trees and diffusion models are ostensibly disparate model classes, one discrete and hierarchical, the other continuous and dynamic. This work unifies the two by establishing a crisp mathematical correspondence between hierarchical decision trees and diffusion processes in appropriate limiting regimes. Our unification reveals a shared optimization principle: Global Trajectory Score Matching (GTSM), for which gradient boosting (in an idealized version) is asymptotically optimal. We underscore the conceptual value of our work through two key practical instantiations: \treeflow, which achieves competitive generation quality on tabular data with higher fidelity and a 2\times computational speedup, and \dsmtree, a novel distillation method that transfers hierarchical decision logic into neural networks, matching teacher performance within 2\% on many benchmarks.

View Paper