Hallucinations Can Improve Large Language Models in Drug Discovery
Shuzhou Yuan, Michael Färber
2025-01-24

Summary
This paper talks about using 'hallucinations' in AI language models to improve drug discovery. It's like using a computer's imagination to come up with new ideas for medicines.
What's the problem?
Usually, when AI models make things up or 'hallucinate', it's seen as a bad thing. But in creative fields like finding new medicines, these made-up ideas might actually be helpful. The problem was that no one had really tested if these AI hallucinations could be useful in drug discovery.
What's the solution?
The researchers tested their idea using seven different AI models. They had the AIs describe molecules in regular language, including some made-up details. Then they used these descriptions to help with tasks related to finding new drugs. They found that the AI models actually did better when they included some of these 'hallucinated' or imagined details. One model called Llama-3.1-8B did 18.35% better when it used hallucinations.
Why it matters?
This matters because it could change how we use AI in science, especially in creating new medicines. Instead of always trying to make AI stick to exact facts, we might sometimes want to let it be more creative. This could lead to discovering new drugs faster or finding medicines we might not have thought of otherwise. It's like giving scientists a super-powered brainstorming partner that can come up with wild but potentially useful ideas.
Abstract
Concerns about hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration. In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery. To verify this hypothesis, we use LLMs to describe the SMILES string of molecules in natural language and then incorporate these descriptions as part of the prompt to address specific tasks in drug discovery. Evaluated on seven LLMs and five classification tasks, our findings confirm the hypothesis: LLMs can achieve better performance with text containing hallucinations. Notably, Llama-3.1-8B achieves an 18.35% gain in ROC-AUC compared to the baseline without hallucination. Furthermore, hallucinations generated by GPT-4o provide the most consistent improvements across models. Additionally, we conduct empirical analyses and a case study to investigate key factors affecting performance and the underlying reasons. Our research sheds light on the potential use of hallucinations for LLMs and offers new perspectives for future research leveraging LLMs in drug discovery.