Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Gang Liu, Michael Sun, Wojciech Matusik, Meng Jiang, Jie Chen

2024-10-10

Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning

Summary

This paper discusses Llamole, a new multimodal large language model designed for inverse molecular design and retrosynthetic planning, which can generate both text and graph representations of molecules.

What's the problem?

While large language models (LLMs) have made progress in handling images, they struggle with graphs, which are essential for representing molecular structures. This limitation makes it hard to apply these models in fields like drug discovery and materials science, where understanding the relationships between molecular components is crucial.

What's the solution?

To solve this issue, the authors developed Llamole, the first model that can generate text and graph representations simultaneously. It combines a base LLM with advanced technologies like Graph Diffusion Transformers and Graph Neural Networks to create detailed molecular designs and predict chemical reactions. Llamole also uses an efficient search method for planning how to synthesize molecules based on their structures. The researchers tested Llamole against other models and found it significantly outperformed them in generating controllable molecular designs.

Why it matters?

This research is important because it enhances the ability of AI to assist in complex tasks like drug design and materials development. By allowing scientists to easily create and analyze molecular structures through an intuitive interface, Llamole can accelerate discoveries in chemistry and related fields, ultimately leading to new innovations in medicine and technology.

Abstract

While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug design. This difficulty stems from the need for coherent autoregressive generation across texts and graphs. To address this, we introduce Llamole, the first multimodal LLM capable of interleaved text and graph generation, enabling molecular inverse design with retrosynthetic planning. Llamole integrates a base LLM with the Graph Diffusion Transformer and Graph Neural Networks for multi-conditional molecular generation and reaction inference within texts, while the LLM, with enhanced molecular understanding, flexibly controls activation among the different graph modules. Additionally, Llamole integrates A* search with LLM-based cost functions for efficient retrosynthetic planning. We create benchmarking datasets and conduct extensive experiments to evaluate Llamole against in-context learning and supervised fine-tuning. Llamole significantly outperforms 14 adapted LLMs across 12 metrics for controllable molecular design and retrosynthetic planning.

View Paper