SlimLM: An Efficient Small Language Model for On-Device Document Assistance
Thang M. Pham, Phat T. Nguyen, Seunghyun Yoon, Viet Dac Lai, Franck Dernoncourt, Trung Bui
2024-11-19

Summary
This paper introduces SlimLM, a small language model designed for efficient document assistance on mobile devices, particularly smartphones.
What's the problem?
While small language models (SLMs) have potential for use on mobile devices, their real-world effectiveness and applications have not been thoroughly explored. Mobile devices have limitations in processing power and memory, making it challenging to run these models effectively for tasks like summarization and question answering.
What's the solution?
The authors developed SlimLM, which includes a series of SLMs optimized for document-related tasks. They conducted extensive experiments on a Samsung Galaxy S24 to find the best balance between model size (ranging from 125 million to 7 billion parameters), context length, and processing speed. SlimLM was pre-trained on a large dataset called SlimPajama-627B and fine-tuned on a specific dataset for document tasks called DocAssist. The smallest version of SlimLM works efficiently on the S24, while larger versions provide enhanced capabilities within the limits of mobile devices.
Why it matters?
This research is important because it demonstrates how advanced language models can be effectively deployed on smartphones, enhancing user experience by providing document assistance directly on devices. This approach can improve privacy since data processing happens locally on the device rather than in the cloud, and it can also reduce costs associated with server usage. The findings pave the way for future research and development in mobile AI applications.
Abstract
While small language models (SLMs) show promises for mobile deployment, their real-world performance and applications on smartphones remains underexplored. We present SlimLM, a series of SLMs optimized for document assistance tasks on mobile devices. Through extensive experiments on a Samsung Galaxy S24, we identify the optimal trade-offs between model size (ranging from 125M to 7B parameters), context length, and inference time for efficient on-device processing. SlimLM is pre-trained on SlimPajama-627B and fine-tuned on DocAssist, our constructed dataset for summarization, question answering and suggestion tasks. Our smallest model demonstrates efficient performance on S24, while larger variants offer enhanced capabilities within mobile constraints. We evaluate SlimLM against existing SLMs, showing comparable or superior performance and offering a benchmark for future research in on-device language models. We also provide an Android application, offering practical insights into SLM deployment. Our findings provide valuable insights and illuminate the capabilities of running advanced language models on high-end smartphones, potentially reducing server costs and enhancing privacy through on-device processing.