MemLoRA: Distilling Expert Adapters for On-Device Memory Systems
Massimo Bini, Ondrej Bohdal, Umberto Michieli, Zeynep Akata, Mete Ozay, Taha Ceritli
2025-12-10
Summary
This paper introduces a new way to give small AI models a 'memory' so they can have better, more personalized conversations and understand images, all without needing a powerful computer or sending your data to the cloud.
What's the problem?
Large language models (LLMs) are great at remembering things during conversations, making them feel more natural and allowing for personalized experiences. However, these LLMs are too big and expensive to run directly on your phone or computer. Smaller models are better for on-device use, but they don't perform as well, and existing memory systems don't work well with images – they mostly handle text. This limits their usefulness in situations where understanding both text and visuals is important.
What's the solution?
The researchers developed a system called MemLoRA, which adds small 'memory adapters' to smaller language models. These adapters are specifically trained to handle different memory tasks like storing information, updating it, and using it to generate responses. They also created MemLoRA-V, which adds small vision-language models to the system, allowing it to understand images directly. This approach allows smaller models to perform memory operations accurately without relying on cloud computing, and it significantly improves their ability to understand visual information.
Why it matters?
This work is important because it makes it possible to have AI assistants with good memories and visual understanding that can run directly on your devices, protecting your privacy and reducing reliance on the internet. The system performs surprisingly well, even outperforming much larger models in some cases, and opens the door for more capable and private on-device AI applications.
Abstract
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device settings that allow users to keep their conversations and data private. However, memory-augmented systems typically rely on LLMs that are too costly for local on-device deployment. Even though Small Language Models (SLMs) are more suitable for on-device inference than LLMs, they cannot achieve sufficient performance. Additionally, these LLM-based systems lack native visual capabilities, limiting their applicability in multimodal contexts. In this paper, we introduce (i) MemLoRA, a novel memory system that enables local deployment by equipping SLMs with specialized memory adapters, and (ii) its vision extension MemLoRA-V, which integrates small Vision-Language Models (SVLMs) to memory systems, enabling native visual understanding. Following knowledge distillation principles, each adapter is trained separately for specific memory operationsx2013knowledge extraction, memory update, and memory-augmented generation. Equipped with memory adapters, small models enable accurate on-device memory operations without cloud dependency. On text-only operations, MemLoRA outperforms 10times larger baseline models (e.g., Gemma2-27B) and achieves performance comparable to 60times larger models (e.g., GPT-OSS-120B) on the LoCoMo benchmark. To evaluate visual understanding operations instead, we extend LoCoMo with challenging Visual Question Answering tasks that require direct visual reasoning. On this, our VLM-integrated MemLoRA-V shows massive improvements over caption-based approaches (81.3 vs. 23.7 accuracy) while keeping strong performance in text-based tasks, demonstrating the efficacy of our method in multimodal contexts.