LFM2 Technical Report

Alexander Amini, Anna Banaszak, Harold Benoit, Arthur Böök, Tarek Dakhran, Song Duong, Alfred Eng, Fernando Fernandes, Marc Härkönen, Anne Harrington, Ramin Hasani, Saniya Karwa, Yuri Khrustalev, Maxime Labonne, Mathias Lechner, Valentine Lechner, Simon Lee, Zetian Li, Noel Loo, Jacob Marks, Edoardo Mosca, Samuel J. Paech

2025-12-02

Summary

This paper introduces LFM2, a new set of AI models designed to be powerful yet small enough to run directly on devices like phones and computers, without needing a constant internet connection.

What's the problem?

Many current AI models are huge and require a lot of computing power and memory, making them impractical for use on everyday devices. Running these models on phones or laptops is slow and drains battery life. The goal was to create models that are both accurate *and* efficient enough for on-device use.

What's the solution?

The researchers developed LFM2 models ranging in size, using a special design that combines different techniques to make them faster and smaller. They used a process called 'hardware-in-the-loop' to optimize the models specifically for the hardware they'd be running on. They also improved the training process with methods like carefully ordering the data and combining different learning approaches. They created versions of LFM2 that can handle not just text, but also images, audio, and information retrieval tasks.

Why it matters?

LFM2 is important because it brings advanced AI capabilities to devices where it wasn't previously feasible. This means things like faster voice assistants, better image recognition in apps, and more responsive AI features, all without relying on a network connection. The models are also released publicly, allowing others to build upon this work and create even more innovative applications.

Abstract

We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a small number of grouped query attention blocks, delivering up to 2x faster prefill and decode on CPUs compared to similarly sized models. The LFM2 family covers 350M-8.3B parameters, including dense models (350M, 700M, 1.2B, 2.6B) and a mixture-of-experts variant (8.3B total, 1.5B active), all with 32K context length. LFM2's training pipeline includes a tempered, decoupled Top-K knowledge distillation objective that avoids support mismatch; curriculum learning with difficulty-ordered data; and a three-stage post-training recipe of supervised fine-tuning, length-normalized preference optimization, and model merging. Pre-trained on 10-12T tokens, LFM2 models achieve strong results across diverse benchmarks; for example, LFM2-2.6B reaches 79.56% on IFEval and 82.41% on GSM8K. We further build multimodal and retrieval variants: LFM2-VL for vision-language tasks, LFM2-Audio for speech, and LFM2-ColBERT for retrieval. LFM2-VL supports tunable accuracy-latency tradeoffs via token-efficient visual processing, while LFM2-Audio separates audio input and output pathways to enable real-time speech-to-speech interaction competitive with models 3x larger. LFM2-ColBERT provides a low-latency encoder for queries and documents, enabling high-performance retrieval across multiple languages. All models are released with open weights and deployment packages for ExecuTorch, llama.cpp, and vLLM, making LFM2 a practical base for edge applications that need fast, memory-efficient inference and strong task capabilities.

View Paper