Stopping Rules for SGD via Anytime-Valid Confidence Sequences

arXiv:2512.13123v4 Announce Type: replace-cross
Abstract: Deciding when to stop stochastic gradient descent (SGD) has long remained unresolved in a statistically rigorous sense. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.

Liked Liked