TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas, Laura Diosan, Andrei Piscoran, Andreea Tomescu
2025-05-02
Summary
This paper talks about TF1-EN-3M, a huge new collection of three million English moral fables created by AI, which can be used to help train small language models.
What's the problem?
Small language models often don't have enough high-quality, creative stories to learn from, which makes it hard for them to understand or generate interesting and meaningful tales, especially ones that teach a lesson.
What's the solution?
The researchers used AI models that are tuned to follow instructions to generate millions of fables in a structured way, then checked their quality with both computers and real people, and made the whole dataset available for anyone to use.
Why it matters?
This matters because it gives AI developers and teachers a massive, free resource to help train and improve small language models, making them better at storytelling, understanding morals, and being creative.
Abstract
A new dataset, TF1-EN-3M, uses instruction-tuned models to generate three million English fables following a structured format, evaluated using a combination of automated metrics and human judgments, and released under a permissive license.