A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication
A. I. Perminov
2025-12-29
Summary
This paper introduces a faster and more efficient way to multiply 3x3 matrices, which are arrangements of numbers in rows and columns, especially when dealing with more complex mathematical systems than simple numbers.
What's the problem?
Multiplying matrices is a fundamental operation in many areas of math and computer science, but it can be computationally expensive, meaning it takes a lot of processing power. Existing methods for multiplying 3x3 matrices involved a certain number of individual addition steps, and researchers were trying to find a way to reduce that number to make the process faster.
What's the solution?
The researchers developed a new algorithm, a set of instructions for performing the matrix multiplication, that requires fewer addition steps than previous methods. They used a computer program to automatically search for the best possible arrangement of operations, ultimately finding a method that uses only 58 additions, which is an improvement over the previous best of 60. This new method also cleverly uses only the numbers -1, 0, and 1, making it work well in a variety of mathematical situations.
Why it matters?
Reducing the number of operations needed for matrix multiplication can have a significant impact on performance in many applications, such as computer graphics, scientific simulations, and machine learning. Even a small reduction in the number of additions can lead to faster processing times and more efficient use of computing resources. This new algorithm provides a more efficient and versatile method for matrix multiplication.
Abstract
This paper presents a new state-of-the-art algorithm for exact 3times3 matrix multiplication over general non-commutative rings, achieving a rank-23 scheme with only 58 scalar additions. This improves the previous best additive complexity of 60 additions without a change of basis. The result was discovered through an automated search combining ternary-restricted flip-graph exploration with greedy intersection reduction for common subexpression elimination. The resulting scheme uses only coefficients from {-1, 0, 1}, ensuring both efficiency and portability across arbitrary fields. The total scalar operation count is reduced from 83 to 81.