Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

Mengying Wang, Chenhui Ma, Ao Jiao, Tuo Liang, Pengjun Lu, Shrinidhi Hegde, Yu Yin, Evren Gurkan-Cavusoglu, Yinghui Wu

2025-11-18

Assessing LLMs for Serendipity Discovery in Knowledge Graphs: A Case for Drug Repurposing

Summary

This paper investigates how well large language models, which are good at answering questions based on knowledge graphs, can actually suggest *unexpected* but useful answers, not just the most obvious ones.

What's the problem?

Current systems using large language models for knowledge graph question answering are really good at finding answers that make sense and are directly related to the question, but they don't excel at uncovering surprising or novel connections. Essentially, they lack the ability to suggest answers that are insightful and weren't already expected. There wasn't a good way to measure how 'serendipitous' an answer was – how much it offered a surprising, yet valuable, insight.

What's the solution?

The researchers created a framework called SerenQA to specifically test this ability. They defined what 'serendipity' means in this context, looking at how relevant, new, and surprising an answer is. They also built a dataset based on medical knowledge, specifically focusing on finding new uses for existing drugs. SerenQA tests models in stages: first finding relevant information, then reasoning about connections within that information, and finally, identifying truly surprising discoveries. They then tested current large language models using this framework.

Why it matters?

This work is important because it highlights a weakness in current AI systems. While they can access and process a lot of information, they aren't very good at creative problem-solving or making unexpected connections. Improving this 'serendipity' could lead to breakthroughs in fields like medicine, where finding new uses for old drugs could save lives and resources. It also provides a benchmark and tools for other researchers to improve these models.

Abstract

Large Language Models (LLMs) have greatly advanced knowledge graph question answering (KGQA), yet existing systems are typically optimized for returning highly relevant but predictable answers. A missing yet desired capacity is to exploit LLMs to suggest surprise and novel ("serendipitious") answers. In this paper, we formally define the serendipity-aware KGQA task and propose the SerenQA framework to evaluate LLMs' ability to uncover unexpected insights in scientific KGQA tasks. SerenQA includes a rigorous serendipity metric based on relevance, novelty, and surprise, along with an expert-annotated benchmark derived from the Clinical Knowledge Graph, focused on drug repurposing. Additionally, it features a structured evaluation pipeline encompassing three subtasks: knowledge retrieval, subgraph reasoning, and serendipity exploration. Our experiments reveal that while state-of-the-art LLMs perform well on retrieval, they still struggle to identify genuinely surprising and valuable discoveries, underscoring a significant room for future improvements. Our curated resources and extended version are released at: https://cwru-db-group.github.io/serenQA.

View Paper