Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties
Zhenglin Wang, Jialong Wu, Pengfei LI, Yong Jiang, Deyu Zhou
2025-02-25
Summary
This paper talks about a new way to test how well AI language models understand and reason about time, especially in the context of Chinese history and culture
What's the problem?
Current tests for AI's ability to understand time are too simple and don't cover enough real-world situations. They often use made-up rules and don't consider how different historical events or cultural elements relate to each other in time
What's the solution?
The researchers created a new test called Chinese Time Reasoning (CTM) that uses the long and complex history of Chinese dynasties to challenge AI models. This test looks at how well AI can understand relationships between different historical figures, events, and cultural elements across time periods. It also checks if AI can put these things in the right order and understand their cultural significance
Why it matters?
This matters because understanding time is crucial for AI to be truly helpful in real-world situations. By using Chinese history as a test, we can see how well AI handles complex, culturally-specific information about time. This could help make AI better at tasks that involve history, culture, and understanding how events relate to each other over time, which is important for fields like education, research, and even everyday conversations about history and culture
Abstract
Temporal reasoning is fundamental to human cognition and is crucial for various real-world applications. While recent advances in Large Language Models have demonstrated promising capabilities in temporal reasoning, existing benchmarks primarily rely on rule-based construction, lack contextual depth, and involve a limited range of temporal entities. To address these limitations, we introduce Chinese Time Reasoning (CTM), a benchmark designed to evaluate LLMs on temporal reasoning within the extensive scope of Chinese dynastic chronology. CTM emphasizes cross-entity relationships, pairwise temporal alignment, and contextualized and culturally-grounded reasoning, providing a comprehensive evaluation. Extensive experimental results reveal the challenges posed by CTM and highlight potential avenues for improvement.