< Explain other AI papers

Modifying Large Language Model Post-Training for Diverse Creative Writing

John Joon Young Chung, Vishakh Padmakumar, Melissa Roemmele, Yuqian Sun, Max Kreminski

2025-03-24

Modifying Large Language Model Post-Training for Diverse Creative
  Writing

Summary

This paper is about making AI language models better at creative writing by encouraging them to produce more varied and unique outputs.

What's the problem?

AI language models often focus on generating high-quality text, but they can struggle to produce diverse and original creative writing.

What's the solution?

The researchers developed a method that encourages AI models to learn from unusual and rare examples, which helps them generate more diverse outputs without sacrificing quality.

Why it matters?

This work matters because it can lead to AI tools that can assist writers in generating more creative and original content.

Abstract

As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training often focuses on improving generation quality but neglects to facilitate output diversity. Hence, in creative writing generation, we investigate post-training approaches to promote both output diversity and quality. Our core idea is to include deviation -- the degree of difference between a training sample and all other samples with the same prompt -- in the training objective to facilitate learning from rare high-quality instances. By adopting our approach to direct preference optimization (DPO) and odds ratio preference optimization (ORPO), we demonstrate that we can promote the output diversity of trained models while minimally decreasing quality. Our best model with 8B parameters could achieve on-par diversity as a human-created dataset while having output quality similar to the best instruction-tuned models we examined, GPT-4o and DeepSeek-R1. We further validate our approaches with a human evaluation, an ablation, and a comparison to an existing diversification approach, DivPO.