Learning the Latent Rules of a Game from Data: A Chess Story
Ben Fauber
2024-10-04

Summary
This paper discusses how small pretrained language models can learn the rules of chess from data, demonstrating that they can generate legal moves and solve chess problems effectively.
What's the problem?
Many existing models struggle to learn complex tasks like chess due to a lack of sufficient training data. While there are many examples of chess games, teaching a model to understand the rules and make strategic moves requires a lot of labeled examples that show how to play correctly. Without enough quality data, models may not perform well in generating valid moves or solving chess-related problems.
What's the solution?
To overcome this challenge, the authors used small language models with millions of parameters and fine-tuned them using between 1,000 and 1,000,000 examples of chess data. They focused on training these models to understand the rules of chess and propose legal moves. The study also examined how repeated training (fine-tuning epochs) improved the models' performance and reduced mistakes (hallucinations) when generating moves.
Why it matters?
This research is significant because it shows that even smaller language models can effectively learn complex tasks like chess through proper training. By demonstrating that these models can understand and apply the rules of a game, it opens up possibilities for using similar techniques in other areas where learning from data is essential, potentially improving AI's ability to handle various tasks.
Abstract
We demonstrate that small pretrained foundational generative language models with millions of parameters can learn the latent rules of a process from data associated with the process. Inspired by Stefan Zweig's novella "Schachnovelle," also known as "The Royal Game" in English, we show that 28M and 125M parameter pretrained foundational small language models (SLMs) can be instruction fine-tuned with 1,000-to-1,000,000 examples to learn the rules of chess, propose legal moves, and accurately solve chess problems. We also explore the impact of successive language model fine-tuning epochs on improved outcomes and demonstrate reductions in model hallucinations by increasing the number of instruction fine-tuning examples.