Tight Bounds for Logistic Regression with Large Stepsize Gradient Descent in Low Dimension

arXiv:2602.12471v1 Announce Type: new
Abstract: We consider the optimization problem of minimizing the logistic loss with gradient descent to train a linear model for binary classification with separable data. With a budget of $T$ iterations, it was recently shown that an accelerated $1/T^2$ rate is possible by choosing a large step size $eta = Theta(gamma^2 T)$ (where $gamma$ is the dataset’s margin) despite the resulting non-monotonicity of the loss. In this paper, we provide a tighter analysis of gradient descent for this problem when the data is two-dimensional: we show that GD with a sufficiently large learning rate $eta$ finds a point with loss smaller than $mathcal{O}(1/(eta T))$, as long as $T geq Omega(n/gamma + 1/gamma^2)$, where $n$ is the dataset size. Our improved rate comes from a tighter bound on the time $tau$ that it takes for GD to transition from unstable (non-monotonic loss) to stable (monotonic loss), via a fine-grained analysis of the oscillatory dynamics of GD in the subspace orthogonal to the max-margin classifier. We also provide a lower bound of $tau$ matching our upper bound up to logarithmic factors, showing that our analysis is tight.

Liked Liked