Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach

Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix–multiplication tensor. We design a neural architecture, textsc{StrassenNet}, which reproduces the Strassen algorithm for $2times 2$ multiplication. Across many independent runs the network always converges to a rank-$7$ tensor, thus numerically recovering Strassen’s optimal algorithm. We then train the same architecture on $3times 3$ multiplication with rank $rin{19,dots,23}$. Our experiments reveal a clear numerical threshold: models with $r=23$ attain significantly lower validation error than those with $rle 22$, suggesting that $r=23$ could actually be the smallest effective rank of the matrix multiplication tensor $3times 3$.
We also sketch an extension of the method to border-rank decompositions via an $varepsilon$–parametrisation and report preliminary results consistent with the known bounds for the border rank of the $3times 3$ matrix–multiplication tensor.

Liked Liked