PhononBench:A Large-Scale Phonon-Based Benchmark for Dynamical Stability in Crystal Generation
Xiao-Qi Han, Ze-Feng Gao, Peng-Jie Guo, Zhong-Yi Lu
2025-12-25
Summary
This paper introduces PhononBench, a new tool for testing how realistic crystals created by artificial intelligence actually are. It checks if these AI-designed crystals would even stay stable in the real world, a crucial factor often overlooked.
What's the problem?
Currently, AI is being used to design new materials, but there's no good way to quickly check if these designs are actually stable. A crystal needs to be dynamically stable – meaning its atoms don't just vibrate wildly and fall apart. Existing AI models often create crystal structures that aren't stable, meaning they wouldn't exist in nature. The paper found that, on average, only about 26% of AI-generated crystals are stable, even when the AI is specifically told to aim for certain properties like a specific band gap.
What's the solution?
The researchers created PhononBench, which uses a highly accurate computer simulation method to quickly calculate the stability of a huge number of AI-generated crystals – over 100,000! They tested crystals created by six different AI models and analyzed how stability changed depending on how the AI was instructed to design the crystals, like whether it was told to target a specific band gap or to create crystals with certain symmetries. They also made all their data and tools publicly available so other researchers can use them.
Why it matters?
This work is important because it highlights a major weakness in current AI-driven materials design: a lack of focus on real-world stability. By providing a benchmark and a large dataset of stable crystal structures, PhononBench gives AI developers clear goals and tools to improve their models and design materials that are actually possible to create and use. It moves the field closer to designing truly novel and useful materials with AI.
Abstract
In this work, we introduce PhononBench, the first large-scale benchmark for dynamical stability in AI-generated crystals. Leveraging the recently developed MatterSim interatomic potential, which achieves DFT-level accuracy in phonon predictions across more than 10,000 materials, PhononBench enables efficient large-scale phonon calculations and dynamical-stability analysis for 108,843 crystal structures generated by six leading crystal generation models. PhononBench reveals a widespread limitation of current generative models in ensuring dynamical stability: the average dynamical-stability rate across all generated structures is only 25.83%, with the top-performing model, MatterGen, reaching just 41.0%. Further case studies show that in property-targeted generation-illustrated here by band-gap conditioning with MatterGen--the dynamical-stability rate remains as low as 23.5% even at the optimal band-gap condition of 0.5 eV. In space-group-controlled generation, higher-symmetry crystals exhibit better stability (e.g., cubic systems achieve rates up to 49.2%), yet the average stability across all controlled generations is still only 34.4%. An important additional outcome of this study is the identification of 28,119 crystal structures that are phonon-stable across the entire Brillouin zone, providing a substantial pool of reliable candidates for future materials exploration. By establishing the first large-scale dynamical-stability benchmark, this work systematically highlights the current limitations of crystal generation models and offers essential evaluation criteria and guidance for their future development toward the design and discovery of physically viable materials. All model-generated crystal structures, phonon calculation results, and the high-throughput evaluation workflows developed in PhononBench will be openly released at https://github.com/xqh19970407/PhononBench