A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Sihyeong Park, Sungryeol Jeon, Chaelyn Lee, Seokhun Jeon, Byung-Soo Kim, Jemin Lee

2025-05-06

A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency

Summary

This paper talks about a big review of different systems, called inference engines, that help large language models run faster and more efficiently.

What's the problem?

Running large language models can be slow and require a lot of computer resources, which makes it hard for people and companies to use them easily or cheaply.

What's the solution?

The researchers compared 25 different open-source and commercial inference engines, looking at how well they work, how efficient they are, and what techniques they use to make things faster, while also suggesting ideas for future improvements.

Why it matters?

This matters because understanding which systems work best can help developers and businesses pick the right tools, save money, and make AI more accessible for everyone.

Abstract

A comprehensive evaluation of 25 open-source and commercial LLM inference engines across various criteria and optimization techniques is provided, with an outline of future research directions.

View Paper