The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Max Ruiz Luyten, Mihaela van der Schaar

2026-01-05

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Summary

This paper investigates a problem with how we currently train powerful large language models (LLMs) – specifically, that focusing too much on getting the *right* answer can actually make them less creative and flexible in their thinking.

What's the problem?

LLMs are often improved by having them generate multiple possible solutions to a problem and then reinforcing the best ones. While this boosts accuracy, the researchers found that it leads to the model getting stuck in a rut, repeatedly choosing the same types of solutions and losing the ability to explore new, potentially better approaches. It's like practicing only one type of math problem and then struggling with anything slightly different. This happens because the model's 'thinking process' becomes predictable and lacks diversity.

What's the solution?

The researchers developed a new framework called Distributional Creative Reasoning (DCR) to understand and fix this issue. DCR essentially looks at training LLMs as guiding the flow of probabilities across all possible solution paths. They showed that existing methods like STaR, GRPO, and DPO are actually just specific ways of using this general idea. Using DCR, they proved why focusing solely on correctness causes this loss of diversity and then designed ways to train models that stay both accurate *and* creative, preventing the 'collapse' of diverse thinking.

Why it matters?

This work is important because it provides a fundamental understanding of how to build LLMs that aren't just good at memorizing and repeating information, but can actually think outside the box and solve problems in novel ways. It offers practical advice for training models that are both reliable and capable of genuine creativity, which is crucial for tackling complex real-world challenges.

Abstract

State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.

View Paper