Bias-Optimal Bounds for SGD: A Computer-Aided Lyapunov Analysis

arXiv:2505.17965v2 Announce Type: replace-cross
Abstract: The non-asymptotic analysis of Stochastic Gradient Descent (SGD) typically yields bounds that decompose into a bias term and a variance term. In this work, we focus on the bias component and study the extent to which SGD can match the optimal convergence behavior of deterministic gradient descent. Assuming only (strong) convexity and smoothness of the objective, we derive new bounds that are bias-optimal, in the sense that the bias term coincides with the worst-case rate of gradient descent. Our results hold for the full range of constant step-sizes $gamma L in (0,2)$, including critical and large step-size regimes that were previously unexplored without additional variance assumptions. The bounds are obtained through the construction of a simple Lyapunov energy whose monotonicity yields sharp convergence guarantees. To design the parameters of this energy, we employ the Performance Estimation Problem framework, which we also use to provide numerical evidence for the optimality of the associated variance terms.

Liked Liked