MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models
Hojung Jung, Rodrigo Hormazabal, Jaehyeong Jo, Youngrok Park, Kyunggeun Roh, Se-Young Yun, Sehui Han, Dae-Woong Jeong
2026-02-26
Summary
This paper introduces a new AI model called MolHIT for creating molecules, aiming to improve the process of discovering new drugs and materials.
What's the problem?
Current AI models that generate molecules as graphs – which is a common way to represent their structure – often create molecules that aren't chemically realistic or don't have the specific properties scientists are looking for. They generally don't perform as well as models that represent molecules as a simple string of characters, even though graphs seem like a more natural fit for molecular structure.
What's the solution?
The researchers developed MolHIT, which uses a new type of diffusion model. This model works by gradually adding information to a molecule, guided by rules about chemistry. It also cleverly separates the different types of atoms based on their roles in the molecule, helping it create more valid and useful structures. Essentially, it's a more sophisticated way of building molecules step-by-step, ensuring they follow chemical rules.
Why it matters?
MolHIT is a significant step forward because it creates molecules with almost perfect chemical validity – meaning they're actually stable and could exist in real life – and outperforms previous graph-based models, even surpassing simpler 1D models in many ways. This means it could greatly speed up the discovery of new drugs and materials with desired characteristics, like specific strength or reactivity.
Abstract
Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.