Improving Chemical Understanding of LLMs via SMILES Parsing

Yunhui Jang, Jaehyung Kim, Sungsoo Ahn

2025-05-28

Improving Chemical Understanding of LLMs via SMILES Parsing

Summary

This paper talks about CLEANMOL, a new system that helps large language models better understand chemical structures by teaching them how to read and interpret SMILES, which is a special way of writing out molecules.

What's the problem?

The problem is that most language models aren't very good at understanding the details of chemical structures when they're written in SMILES format, which makes it hard for them to help with tasks in molecular science.

What's the solution?

To solve this, the researchers turned the job of reading SMILES into a set of structured tasks that the AI can learn step by step. This approach helps the model get much better at understanding and working with chemical information.

Why it matters?

This is important because it means AI can be more useful in chemistry and drug discovery, making it easier for scientists to get accurate help from AI when working with molecules.

Abstract

CLEANMOL, a novel framework, enhances structural comprehension in large language models for molecular science by formulating SMILES parsing into structured tasks, improving performance on Mol-Instructions.

View Paper