HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Duolin Sun, Dan Yang, Yue Shen, Yihan Jiao, Zhehao Tan, Jie Feng, Lianzhen Zhong, Jian Wang, Peng Wei, Jinjie Gu

2025-09-15

HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Summary

This paper introduces a new system called HANRAG that improves how computers answer questions by combining information retrieval with powerful language models.

What's the problem?

Current question-answering systems that use this combined approach struggle with complex questions that require multiple steps of reasoning. They often waste time searching for information repeatedly, or they get confused by irrelevant information that muddies the results, leading to inaccurate answers. Essentially, they have trouble breaking down big questions into smaller, manageable parts and filtering out the noise.

What's the solution?

HANRAG solves this by acting like a smart router and filter. It first figures out how complex a question is, then breaks it down into simpler sub-questions. It then retrieves information for each sub-question and importantly, filters out irrelevant information before using it to generate an answer. This makes the system more efficient and accurate, even with difficult questions.

Why it matters?

This research is important because it makes question-answering systems more reliable and capable. By improving how these systems handle complex questions, we can build better chatbots, virtual assistants, and tools for accessing and understanding information, ultimately leading to more helpful and intelligent AI.

Abstract

The Retrieval-Augmented Generation (RAG) approach enhances question-answering systems and dialogue generation tasks by integrating information retrieval (IR) technologies with large language models (LLMs). This strategy, which retrieves information from external knowledge bases to bolster the response capabilities of generative models, has achieved certain successes. However, current RAG methods still face numerous challenges when dealing with multi-hop queries. For instance, some approaches overly rely on iterative retrieval, wasting too many retrieval steps on compound queries. Additionally, using the original complex query for retrieval may fail to capture content relevant to specific sub-queries, resulting in noisy retrieved content. If the noise is not managed, it can lead to the problem of noise accumulation. To address these issues, we introduce HANRAG, a novel heuristic-based framework designed to efficiently tackle problems of varying complexity. Driven by a powerful revelator, HANRAG routes queries, decomposes them into sub-queries, and filters noise from retrieved documents. This enhances the system's adaptability and noise resistance, making it highly capable of handling diverse queries. We compare the proposed framework against other leading industry methods across various benchmarks. The results demonstrate that our framework obtains superior performance in both single-hop and multi-hop question-answering tasks.

View Paper