WebNovelBench: Placing LLM Novelists on the Web Novel Distribution

Leon Lin, Jun Zheng, Haidong Wang

2025-05-22

WebNovelBench: Placing LLM Novelists on the Web Novel Distribution

Summary

This paper talks about WebNovelBench, a new tool that tests how well AI language models can write long stories by comparing them to thousands of popular Chinese web novels and judging their storytelling skills across several important areas.

What's the problem?

It's hard to fairly and thoroughly measure how good AI models are at writing long, interesting stories because most existing tests are too small, not diverse enough, or don't use clear and objective ways to judge the stories.

What's the solution?

The researchers built WebNovelBench using over 4,000 Chinese web novels and created a system where an AI judge scores each story on eight key storytelling qualities, like creativity and plot, then compares these scores to human-written works to see how the AI stacks up.

Why it matters?

This matters because it gives a reliable and large-scale way to see how close AI models are to real human writers, helping improve AI storytelling and making sure future models can create better, more engaging stories.

Abstract

WebNovelBench evaluates LLM storytelling capabilities using a large-scale dataset of Chinese web novels, assessing narrative quality across eight dimensions through an LLM-as-Judge framework.

View Paper