Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

Sifeng Shang, Jiayi Zhou, Chenyu Lin, Minxian Li, Kaiyang Zhou

2025-05-21

Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

Summary

This paper talks about Quantized Zeroth-order Optimization (QZO), a new way to fine-tune large language models so they use much less memory, making it easier to improve these models even on computers that aren't very powerful.

What's the problem?

The problem is that fine-tuning big AI models usually takes a lot of memory, which makes it hard or even impossible for people with regular computers to update or improve these models for their own needs.

What's the solution?

To solve this, the researchers introduced QZO, a technique that reduces the amount of memory needed by shrinking the size of the model's weights, the gradients, and the information the optimizer needs to keep track of. This allows people to fine-tune large models without needing expensive hardware.

Why it matters?

This matters because it opens up the ability for more people and organizations to customize large language models, making advanced AI more accessible and useful for everyone.

Abstract

Quantized Zeroth-order Optimization (QZO) enables memory-efficient fine-tuning of large language models by minimizing memory usage on model weights, gradients, and optimizer states.

View Paper