On Memorization of Large Language Models in Logical Reasoning
Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar
2024-10-31

Summary
This paper explores how large language models (LLMs) use memorization when solving logical reasoning problems, particularly through a new benchmark based on Knights and Knaves puzzles.
What's the problem?
While LLMs can perform well on reasoning tasks, they sometimes make basic mistakes, which raises questions about how they actually reason. One theory is that their high performance might come from memorizing similar problems instead of genuinely understanding them. This creates confusion about whether LLMs are truly capable of reasoning or just recalling information they've seen before.
What's the solution?
The authors investigate this by creating a benchmark that generates logical reasoning puzzles, specifically Knights and Knaves puzzles. They find that LLMs can achieve high accuracy on these training puzzles after fine-tuning but struggle when faced with slightly altered versions of those puzzles, indicating a heavy reliance on memorization. However, they also show that even with this memorization, fine-tuning helps improve the models' ability to generalize and reason correctly across different problems. Their analysis reveals a complex relationship between memorization and genuine reasoning skills in LLMs.
Why it matters?
This research is important because it sheds light on how LLMs learn and solve problems, helping us understand their capabilities and limitations. By distinguishing between memorization and reasoning, we can develop better training methods for these models, leading to more reliable AI systems that can tackle a wider range of tasks without simply recalling past examples.
Abstract
Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes. This contrasting behavior is puzzling when it comes to understanding the mechanisms behind LLMs' reasoning capabilities. One hypothesis is that the increasingly high and nearly saturated performance on common reasoning benchmarks could be due to the memorization of similar problems. In this paper, we systematically investigate this hypothesis with a quantitative measurement of memorization in reasoning tasks, using a dynamically generated logical reasoning benchmark based on Knights and Knaves (K&K) puzzles. We found that LLMs could interpolate the training puzzles (achieving near-perfect accuracy) after fine-tuning, yet fail when those puzzles are slightly perturbed, suggesting that the models heavily rely on memorization to solve those training puzzles. On the other hand, we show that while fine-tuning leads to heavy memorization, it also consistently improves generalization performance. In-depth analyses with perturbation tests, cross difficulty-level transferability, probing model internals, and fine-tuning with wrong answers suggest that the LLMs learn to reason on K&K puzzles despite training data memorization. This phenomenon indicates that LLMs exhibit a complex interplay between memorization and genuine reasoning abilities. Finally, our analysis with per-sample memorization score sheds light on how LLMs switch between reasoning and memorization in solving logical puzzles. Our code and data are available at https://memkklogic.github.io.