Error Analysis of Matrix Multiplication Emulation Using Ozaki-II Scheme

arXiv:2602.02549v1 Announce Type: new
Abstract: The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications. In this scheme, the attainable numerical accuracy improves as the number of low-precision matrix multiplications increases. Previous numerical studies have shown that single- and double-precision matrix multiplication using the Ozaki-II scheme achieves higher throughput than that of standard BLAS routines on modern AI hardware equipped with fast INT8 matrix multiply-accumulate units with INT8 inputs and INT32 accumulation. However, the accuracy of the Ozaki-II scheme can degrade when the exponent distribution of the input matrices is wide, in which case a large number of low-precision matrix multiplications is required to obtain high-precision results. In this paper, we present a rigorous deterministic error analysis of the Ozaki-II scheme. The proposed analysis not only clarifies the accuracy behavior of the method but also enables the estimation of the number of low-precision matrix multiplications required to achieve a desired level of numerical accuracy.

Liked Liked