A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

arXiv:2508.21022v2 Announce Type: replace-cross
Abstract: Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample settings. We overcome this limitation by instead analyzing SNG as a sketch-and-project method. Motivated by this lens, we discard the usual theoretical proxy which decouples gradients and preconditioners using two independent mini-batches, and we replace it with a new proxy based on squared volume sampling. Under this new proxy we show that the expectation of the SNG direction becomes equal to a preconditioned gradient descent step even in the presence of coupling, leading to (i) global convergence guarantees when using a single mini-batch of any size, and (ii) an explicit characterization of the convergence rate in terms of quantities related to the sketch-and-project structure. These findings in turn yield new insights into small-sample settings, for example by suggesting that the advantage of SNG over SGD is that it can more effectively exploit spectral decay in the model Jacobian. We also extend these ideas to explain a popular structured momentum scheme for SNG, known as SPRING, by showing that it arises naturally from accelerated sketch-and-project methods.

Liked Liked