Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator

Qian Cao, Xiting Wang, Yuzhuo Yuan, Yahui Liu, Fang Luo, Ruihua Song

2025-05-30

Evaluating Text Creativity across Diverse Domains: A Dataset and Large
Language Model Evaluator

Summary

This paper talks about a new way to judge how creative different pieces of writing are by using a special dataset called CreataSet and an AI evaluator named CrEval that compares texts in pairs to see which one is more creative.

What's the problem?

The problem is that it's really hard to measure creativity in writing because it's so subjective, and current AI tools often don't match what humans would think is creative, leading to unreliable results.

What's the solution?

The researchers built a system that uses pairwise comparisons—meaning it looks at two pieces of writing at a time and decides which is more creative, just like a human judge might do. They trained CrEval, an AI evaluator, using this method and the CreataSet dataset, which helped the AI better understand what people see as creative.

Why it matters?

This is important because it makes it possible to more accurately and fairly judge creativity in writing using AI, which can help with things like creative writing contests, education, and improving AI-generated content to be more interesting and original.

Abstract

A novel pairwise-comparison framework using CreataSet dataset trains CrEval, an LLM-based evaluator that significantly improves the assessment of textual creativity aligned with human judgments.

View Paper