Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

Shreyas Rajesh, Pavan Holur, Chenda Duan, David Chong, Vwani Roychowdhury

2025-11-12

Beyond Fact Retrieval: Episodic Memory for RAG with Generative Semantic Workspaces

Summary

This paper introduces a new way for Large Language Models (LLMs) to handle and understand very long pieces of text, like entire stories or documents, by giving them a better 'memory' system.

What's the problem?

LLMs struggle with long texts because they can only 'look at' a limited amount of information at once. When texts are too long, important details get lost. Even when the text fits, their performance gets worse as the text gets longer. Current methods of helping them, like quickly finding relevant information, are good for facts but don't help the LLM understand how things change over time and space within a story or event – basically, they can't track who did what, when, and where.

What's the solution?

The researchers created something called the Generative Semantic Workspace (GSW). Think of it as a way for the LLM to build a structured, organized 'mental map' of what's happening as it reads. It has two main parts: an 'Operator' that turns what it reads into a basic understanding, and a 'Reconciler' that fits those understandings together into a consistent timeline and location. This helps the LLM keep track of characters, actions, and where things are happening, even over long texts. They tested it on a challenging benchmark and it performed significantly better than existing methods, and it also used fewer resources.

Why it matters?

This work is important because it's a step towards giving LLMs a more human-like memory. This means they could become much better at reasoning about complex situations that unfold over time, making them more capable of acting as intelligent agents in the real world, like understanding and responding to long conversations or complex instructions.

Abstract

Large Language Models (LLMs) face fundamental challenges in long-context reasoning: many documents exceed their finite context windows, while performance on texts that do fit degrades with sequence length, necessitating their augmentation with external memory frameworks. Current solutions, which have evolved from retrieval using semantic embeddings to more sophisticated structured knowledge graphs representations for improved sense-making and associativity, are tailored for fact-based retrieval and fail to build the space-time-anchored narrative representations required for tracking entities through episodic events. To bridge this gap, we propose the Generative Semantic Workspace (GSW), a neuro-inspired generative memory framework that builds structured, interpretable representations of evolving situations, enabling LLMs to reason over evolving roles, actions, and spatiotemporal contexts. Our framework comprises an Operator, which maps incoming observations to intermediate semantic structures, and a Reconciler, which integrates these into a persistent workspace that enforces temporal, spatial, and logical coherence. On the Episodic Memory Benchmark (EpBench) huet_episodic_2025 comprising corpora ranging from 100k to 1M tokens in length, GSW outperforms existing RAG based baselines by up to 20\%. Furthermore, GSW is highly efficient, reducing query-time context tokens by 51\% compared to the next most token-efficient baseline, reducing inference time costs considerably. More broadly, GSW offers a concrete blueprint for endowing LLMs with human-like episodic memory, paving the way for more capable agents that can reason over long horizons.

View Paper