MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Jihao Zhao, Zhiyuan Ji, Simin Niu, Hanyu Wang, Feiyu Xiong, Zhiyu Li

2025-10-17

MoM: Mixtures of Scenario-Aware Document Memories for Retrieval-Augmented Generation Systems

Summary

This paper introduces a new way to improve how Retrieval-Augmented Generation (RAG) systems work, moving beyond simply finding relevant text pieces to actually *understanding* the documents they're using. They call this new approach 'Mixtures of scenario-aware document Memories' or MoM.

What's the problem?

Traditional RAG systems often struggle because they just grab chunks of text without truly grasping the overall meaning or important connections within a document. This limits how well they can answer complex questions or reason about the information. It's like trying to understand a book by only reading random paragraphs – you miss the bigger picture and the author's thought process.

What's the solution?

The researchers developed MoM, which first uses a powerful language model to create a detailed outline of a document, almost like a domain expert would. This outline guides the system in selecting the most important parts of the document to create 'memories'. They also use a clever method to train smaller language models to proactively search for and build these memories, and even 'reverse reason' to understand *how* an expert would think about the information. Finally, they created a retrieval system that uses these memories in a smart way, based on probability.

Why it matters?

This work is important because it makes RAG systems much more effective at understanding and using information from documents. It not only improves the quality of the text chunks used by these systems, but also allows smaller, more efficient language models to perform complex tasks that usually require much larger models, bringing us closer to AI that can process text more like a human.

Abstract

The traditional RAG paradigm, which typically engages in the comprehension of relevant text chunks in response to received queries, inherently restricts both the depth of knowledge internalization and reasoning capabilities. To address this limitation, our research transforms the text processing in RAG from passive chunking to proactive understanding, defining this process as document memory extraction with the objective of simulating human cognitive processes during reading. Building upon this, we propose the Mixtures of scenario-aware document Memories (MoM) framework, engineered to efficiently handle documents from multiple domains and train small language models (SLMs) to acquire the ability to proactively explore and construct document memories. The MoM initially instructs large language models (LLMs) to simulate domain experts in generating document logical outlines, thereby directing structured chunking and core content extraction. It employs a multi-path sampling and multi-perspective evaluation mechanism, specifically designing comprehensive metrics that represent chunk clarity and extraction completeness to select the optimal document memories. Additionally, to infuse deeper human-like reading abilities during the training of SLMs, we incorporate a reverse reasoning strategy, which deduces refined expert thinking paths from high-quality outcomes. Finally, leveraging diverse forms of content generated by MoM, we develop a three-layer document memory retrieval mechanism, which is grounded in our theoretical proof from the perspective of probabilistic modeling. Extensive experimental results across three distinct domains demonstrate that the MoM framework not only resolves text chunking challenges in existing RAG systems, providing LLMs with semantically complete document memories, but also paves the way for SLMs to achieve human-centric intelligent text processing.

View Paper