Model-Preserving Adaptive Rounding

Albert Tseng, Zhaofeng Sun, Christopher De Sa

2025-05-30

Summary

This paper talks about YAQA, a new algorithm that helps make large language models smaller and more efficient without losing much of their original abilities by using a smarter way to round numbers during the process.

What's the problem?

The problem is that when you try to shrink big AI models so they use less memory and run faster, the usual way of rounding numbers can make them less accurate or cause them to lose important information, which hurts their performance.

What's the solution?

The researchers created YAQA, which uses advanced math to figure out the best way to round the numbers in the model so that it stays as close as possible to the original. This method looks at how the whole model works together and tries to keep its behavior the same after shrinking.

Why it matters?

This is important because it means we can run powerful AI models on smaller devices like phones or laptops without sacrificing much quality, making advanced AI more accessible and useful for everyone.

Abstract

YAQA, an adaptive rounding algorithm using Kronecker-factored approximations of the full model's Hessian, reduces KL divergence and improves performance for post-training quantization of large language models.

View Paper