Generating π-Functional Molecules Using STGG+ with Active Learning
Alexia Jolicoeur-Martineau, Yan Zhang, Boris Knyazev, Aristide Baratin, Cheng-Hao Liu
2025-02-21
Summary
This paper talks about a new method called STGG+AL for creating molecules with special properties, particularly those that are good at absorbing light. It combines a powerful AI learning technique with a process that keeps improving itself over time.
What's the problem?
Scientists want to discover new molecules with unique properties that aren't found in existing datasets. Current methods either make molecules too similar to what we already know or create unrealistic molecules that can't actually be made in real life.
What's the solution?
The researchers developed STGG+AL, which uses a smart AI system (STGG+) and keeps teaching it new things through active learning. They focused on making molecules that are really good at absorbing light, especially in the near-infrared range. They used complex physics calculations to check if the molecules they created would actually work.
Why it matters?
This matters because it could help us discover new materials for things like solar panels or medical imaging. The method is better at creating useful, realistic molecules than other approaches, and it could speed up the discovery of new materials for various technologies. The researchers also shared their code and a huge dataset of molecules, which could help other scientists make even more discoveries in the future.
Abstract
Generating novel molecules with out-of-distribution properties is a major challenge in molecular discovery. While supervised learning methods generate high-quality molecules similar to those in a dataset, they struggle to generalize to out-of-distribution properties. Reinforcement learning can explore new chemical spaces but often conducts 'reward-hacking' and generates non-synthesizable molecules. In this work, we address this problem by integrating a state-of-the-art supervised learning method, STGG+, in an active learning loop. Our approach iteratively generates, evaluates, and fine-tunes STGG+ to continuously expand its knowledge. We denote this approach STGG+AL. We apply STGG+AL to the design of organic pi-functional materials, specifically two challenging tasks: 1) generating highly absorptive molecules characterized by high oscillator strength and 2) designing absorptive molecules with reasonable oscillator strength in the near-infrared (NIR) range. The generated molecules are validated and rationalized in-silico with time-dependent density functional theory. Our results demonstrate that our method is highly effective in generating novel molecules with high oscillator strength, contrary to existing methods such as reinforcement learning (RL) methods. We open-source our active-learning code along with our Conjugated-xTB dataset containing 2.9 million pi-conjugated molecules and the function for approximating the oscillator strength and absorption wavelength (based on sTDA-xTB).