LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang, Sunghun Kim

2024-08-27

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Summary

This paper introduces LlamaDuo, a system that helps transfer knowledge and abilities from large cloud-based language models to smaller, locally run models, making them easier to use without needing constant internet access.

What's the problem?

Many large language models (LLMs) are hosted in the cloud, which can lead to issues like needing a reliable internet connection, privacy concerns, and dependency on external services. This makes it difficult for users who want to run models on their own devices or in situations where internet access is limited.

What's the solution?

LlamaDuo provides a pipeline that allows users to migrate the knowledge from these large cloud-based models to smaller ones that can be run locally. It fine-tunes a smaller model using data generated by the larger model, ensuring that the smaller one can perform well in specific tasks. If the smaller model doesn't perform as expected, it can be improved further with additional training data. This process allows the smaller model to match or even exceed the performance of the larger model in certain areas.

Why it matters?

This research is significant because it enables users to have more control over their language models, enhancing privacy and reducing reliance on cloud services. By making powerful AI tools accessible for local use, LlamaDuo can help various industries and individuals utilize AI technology more effectively and securely.

Abstract

The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", for the seamless migration of knowledge and abilities from service-oriented LLMs to smaller, locally manageable models. This pipeline is crucial for ensuring service continuity in the presence of operational failures, strict privacy policies, or offline requirements. Our LlamaDuo involves fine-tuning a small language model against the service LLM using a synthetic dataset generated by the latter. If the performance of the fine-tuned model falls short of expectations, it is enhanced by further fine-tuning with additional similar data created by the service LLM. This iterative process guarantees that the smaller model can eventually match or even surpass the service LLM's capabilities in specific downstream tasks, offering a practical and scalable solution for managing AI deployments in constrained environments. Extensive experiments with leading edge LLMs are conducted to demonstrate the effectiveness, adaptability, and affordability of LlamaDuo across various downstream tasks. Our pipeline implementation is available at https://github.com/deep-diver/llamaduo.

View Paper