COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
Denis Makhov, Dmitriy Shopkhoev, Magauiya Zhussip, Ammar Ali, Baher Mohammad, Stamatios Lefkimmiatis
2026-02-18
Summary
This paper introduces a new method, COMPOT, for shrinking the size of Transformer models – the type of AI powering many modern applications – after they’ve already been trained, without significantly losing accuracy.
What's the problem?
Currently, making these models smaller often involves techniques like simplifying the mathematical representations within them. A common approach uses a single, shared simplification for all parts of the model, but this can lead to a noticeable drop in performance. While more flexible methods exist, they often require a lot of back-and-forth calculations to find the best simplification, making them slow and complex.
What's the solution?
COMPOT solves this by finding a sparse, or simplified, representation of the model’s weights using a small amount of example data to guide the process. It cleverly uses mathematical tools called orthogonal dictionaries and Procrustes updates to quickly and directly calculate the best simplification without needing iterative adjustments. It also intelligently decides how much to simplify each individual layer of the model, based on how sensitive that layer is to changes, to maximize overall performance.
Why it matters?
This is important because large AI models require a lot of computing power and memory, making them expensive to run and limiting where they can be used. COMPOT offers a way to significantly reduce the size of these models with minimal loss in accuracy, making them more accessible and efficient, and even allowing for further size reductions when combined with other compression techniques.
Abstract
Post-training compression of Transformer models commonly relies on truncated singular value decomposition (SVD). However, enforcing a single shared subspace can degrade accuracy even at moderate compression. Sparse dictionary learning provides a more flexible union-of-subspaces representation, but existing approaches often suffer from iterative dictionary and coefficient updates. We propose COMPOT (Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers), a training-free compression framework that uses a small calibration dataset to estimate a sparse weight factorization. COMPOT employs orthogonal dictionaries that enable closed-form Procrustes updates for the dictionary and analytical single-step sparse coding for the coefficients, eliminating iterative optimization. To handle heterogeneous layer sensitivity under a global compression budget, COMPOT further introduces a one-shot dynamic allocation strategy that adaptively redistributes layer-wise compression rates. Extensive experiments across diverse architectures and tasks show that COMPOT consistently delivers a superior quality-compression trade-off over strong low-rank and sparse baselines, while remaining fully compatible with post-training quantization for extreme compression. Code is available https://github.com/mts-ai/COMPOT{here}.