A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency
Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee
2025-05-06
Summary
This paper talks about a big review of different systems, called inference engines, that help large language models run faster and more efficiently.
What's the problem?
Running large language models can be slow and require a lot of computer resources, which makes it hard for people and companies to use them easily or cheaply.
What's the solution?
The researchers compared 25 different open-source and commercial inference engines, looking at how well they work, how efficient they are, and what techniques they use to make things faster, while also suggesting ideas for future improvements.
Why it matters?
This matters because understanding which systems work best can help developers and businesses pick the right tools, save money, and make AI more accessible for everyone.
Abstract
A comprehensive evaluation of 25 open-source and commercial LLM inference engines across various criteria and optimization techniques is provided, with an outline of future research directions.