ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning

Yuwei Yin, Giuseppe Carenini

2025-02-10

ARR: Question Answering with Large Language Models via Analyzing,
Retrieving, and Reasoning

Summary

This paper talks about ARR, a new method to help large language models (LLMs) answer questions more accurately by breaking the process into three clear steps: analyzing the question, retrieving useful information, and reasoning step by step.

What's the problem?

Current methods for guiding LLMs, like Chain-of-Thought (CoT) prompting, often give vague instructions like 'think step by step,' which can lead to unclear or incomplete answers. These methods don't provide enough structure for the model to fully understand and solve complex questions.

What's the solution?

The researchers created ARR, a new prompting technique that explicitly guides LLMs through three steps: first, understanding the intent of the question (analyzing), then finding relevant information (retrieving), and finally reasoning through the answer step by step. This structured approach improves the model's ability to handle challenging question-answering tasks and consistently outperforms existing methods like CoT.

Why it matters?

This matters because it makes AI models better at answering difficult questions in a logical and accurate way. By improving how LLMs think through problems, ARR can enhance their usefulness in fields like education, research, and customer support, where reliable answers are crucial.

Abstract

Large language models (LLMs) achieve remarkable performance on challenging benchmarks that are often structured as multiple-choice question-answering (QA) tasks. Zero-shot Chain-of-Thought (CoT) prompting enhances reasoning in LLMs but provides only vague and generic guidance ("think step by step"). This paper introduces ARR, an intuitive and effective zero-shot prompting method that explicitly incorporates three key steps in QA solving: analyzing the intent of the question, retrieving relevant information, and reasoning step by step. Comprehensive experiments across diverse and challenging QA tasks demonstrate that ARR consistently improves the Baseline (without ARR prompting) and outperforms CoT. Ablation and case studies further validate the positive contributions of each component: analyzing, retrieving, and reasoning. Notably, intent analysis plays a vital role in ARR. Additionally, extensive evaluations across various model sizes, LLM series, and generation settings solidify the effectiveness, robustness, and generalizability of ARR.

View Paper