L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

Jiyu Cui, Fang Wu, Haokai Zhao, Minggao Feng, Xenophon Evangelopoulos, Andrew I. Cooper, Yejin Choi

2025-10-31

L^2M^3OF: A Large Language Multimodal Model for Metal-Organic Frameworks

Summary

This paper introduces a new artificial intelligence model, L2M3OF, designed to help discover new materials called MOFs, which are useful for things like capturing carbon dioxide and storing hydrogen.

What's the problem?

While large language models are good at understanding and generating text, they struggle with scientific problems that require understanding complex 3D structures, like the arrangement of atoms in materials. Designing MOFs is particularly hard because there are so many possible arrangements, and current AI relies heavily on the knowledge of experts which isn't easily written down in a way a computer can learn from. Simply using text-based AI isn't enough to effectively design these materials.

What's the solution?

The researchers created L2M3OF, which is different because it doesn't just look at text. It combines information from the structure of the material itself (how the atoms are arranged) with text descriptions and existing knowledge. They used a special tool to convert the 3D structure into a format the AI can understand, then combined that with language data to train the model. They then tested it against some of the most powerful AI models currently available, like GPT-5.

Why it matters?

L2M3OF performed better than these other AI models at predicting properties of MOFs and generating new knowledge about them, even though it's a smaller model. This shows that combining different types of information – structure, text, and knowledge – is crucial for making progress in materials science and could lead to the development of better AI systems for discovering new and useful materials.

Abstract

Large language models have demonstrated remarkable reasoning capabilities across diverse natural language tasks. However, comparable breakthroughs in scientific discovery are more limited, because understanding complex physical phenomena demands multifaceted representations far beyond language alone. A compelling example is the design of functional materials such as MOFs-critical for a range of impactful applications like carbon capture and hydrogen storage. Navigating their vast and intricate design space in language-based representations interpretable by LLMs is challenging due to the numerous possible three-dimensional atomic arrangements and strict reticular rules of coordination geometry and topology. Despite promising early results in LLM-assisted discovery for simpler materials systems, MOF design remains heavily reliant on tacit human expertise rarely codified in textual information alone. To overcome this barrier, we introduce L2M3OF, the first multimodal LLM for MOFs. L2M3OF integrates crystal representation learning with language understanding to process structural, textual, and knowledge modalities jointly. L2M3OF employs a pre-trained crystal encoder with a lightweight projection layer to compress structural information into a token space, enabling efficient alignment with language instructions. To facilitate training and evaluation, we curate a structure-property-knowledge database of crystalline materials and benchmark L2M3OF against state-of-the-art closed-source LLMs such as GPT-5, Gemini-2.5-Pro and DeepSeek-R1. Experiments show that L2M3OF outperforms leading text-based closed-source LLMs in property prediction and knowledge generation tasks, despite using far fewer parameters. These results highlight the importance of multimodal approaches for porous material understanding and establish L2M3OF as a foundation for next-generation AI systems in materials discovery.

View Paper