GRS-QA -- Graph Reasoning-Structured Question Answering Dataset
Anish Pahilajani, Devasha Trivedi, Jincen Shuai, Khin S. Yone, Samyak Rajesh Jain, Namyong Park, Ryan A. Rossi, Nesreen K. Ahmed, Franck Dernoncourt, Yu Wang
2024-11-04

Summary
This paper introduces the Graph Reasoning-Structured Question Answering Dataset (GRS-QA), a new dataset designed to improve how large language models (LLMs) answer complex questions that require multiple steps of reasoning.
What's the problem?
While LLMs have become good at answering questions, they often struggle with multi-hop question answering (M-QA), where the answer depends on connecting information from different parts of text. Current datasets do not clearly show the reasoning paths needed to arrive at answers, making it hard to evaluate how well these models reason through complex questions.
What's the solution?
The authors created GRS-QA, which includes detailed reasoning structures along with the questions and answers. They built reasoning graphs that map out the logical connections between different pieces of information, allowing for a clearer evaluation of how LLMs handle various reasoning tasks. This dataset helps researchers understand how well LLMs can follow different reasoning paths when answering questions.
Why it matters?
This research is important because it provides a better way to assess and improve the reasoning abilities of AI models. By using GRS-QA, researchers can develop more effective LLMs that can tackle complex questions in a way that mimics human thinking, leading to advancements in fields like education, customer support, and information retrieval.
Abstract
Large Language Models (LLMs) have excelled in multi-hop question-answering (M-QA) due to their advanced reasoning abilities. However, the impact of the inherent reasoning structures on LLM M-QA performance remains unclear, largely due to the absence of QA datasets that provide fine-grained reasoning structures. To address this gap, we introduce the Graph Reasoning-Structured Question Answering Dataset (GRS-QA), which includes both semantic contexts and reasoning structures for QA pairs. Unlike existing M-QA datasets, where different reasoning structures are entangled together, GRS-QA explicitly captures intricate reasoning pathways by constructing reasoning graphs, where nodes represent textual contexts and edges denote logical flows. These reasoning graphs of different structures enable a fine-grained evaluation of LLM reasoning capabilities across various reasoning structures. Our empirical analysis reveals that LLMs perform differently when handling questions with varying reasoning structures. This finding facilitates the exploration of textual structures as compared with semantics.