Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

Fatemeh Shahhosseini, Arash Marioriyad, Ali Momen, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban, Shaghayegh Haghjooy Javanmard

2025-11-17

Large Language Models for Scientific Idea Generation: A Creativity-Centered Survey

Summary

This paper explores how well large language models, or LLMs, can come up with new scientific ideas, which is really important for making discoveries and advancing knowledge. It looks at different techniques used to get LLMs to generate these ideas, and how 'creative' those ideas actually are.

What's the problem?

Generating truly novel and useful scientific ideas is a tough challenge. It's not just about being creative, but also making sure the ideas are grounded in what we already know and could potentially be proven true. While LLMs are good at producing text that *sounds* scientific, it's unclear if they can consistently generate genuinely new and valuable ideas, and we don't fully understand *how* they might do so.

What's the solution?

The researchers reviewed a bunch of different methods people are using to make LLMs better at scientific idea generation. They grouped these methods into five main categories: adding extra scientific information to the LLM, carefully crafting the prompts given to the LLM, adjusting how the LLM 'thinks' when it's generating ideas, having multiple LLMs work together, and changing the LLM's underlying programming. They then used established frameworks for understanding creativity to analyze what kind of ideas each method tends to produce and where the creativity is coming from.

Why it matters?

This work is important because it helps us understand the potential of LLMs to assist in scientific discovery. By categorizing and analyzing different approaches, it points the way towards developing more reliable and powerful tools that can help scientists come up with groundbreaking ideas and solve complex problems, ultimately accelerating the pace of scientific progress.

Abstract

Scientific idea generation lies at the heart of scientific discovery and has driven human progress-whether by solving unsolved problems or proposing novel hypotheses to explain unknown phenomena. Unlike standard scientific reasoning or general creative generation, idea generation in science is a multi-objective and open-ended task, where the novelty of a contribution is as essential as its empirical soundness. Large language models (LLMs) have recently emerged as promising generators of scientific ideas, capable of producing coherent and factual outputs with surprising intuition and acceptable reasoning, yet their creative capacity remains inconsistent and poorly understood. This survey provides a structured synthesis of methods for LLM-driven scientific ideation, examining how different approaches balance creativity with scientific soundness. We categorize existing methods into five complementary families: External knowledge augmentation, Prompt-based distributional steering, Inference-time scaling, Multi-agent collaboration, and Parameter-level adaptation. To interpret their contributions, we employ two complementary frameworks: Boden's taxonomy of Combinatorial, Exploratory and Transformational creativity to characterize the level of ideas each family expected to generate, and Rhodes' 4Ps framework-Person, Process, Press, and Product-to locate the aspect or source of creativity that each method emphasizes. By aligning methodological advances with creativity frameworks, this survey clarifies the state of the field and outlines key directions toward reliable, systematic, and transformative applications of LLMs in scientific discovery.

View Paper