Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds

arXiv:2603.06651v1 Announce Type: new
Abstract: Euclidean gradient descent algorithms barely capture the geometry of objective function-induced hypersurfaces and risk driving update trajectories off the hypersurfaces. Riemannian gradient descent algorithms address these issues but fail to represent complex hypersurfaces via a single classic manifold. We propose geodesic gradient descent (GGD), a generic and learning-rate-free Riemannian gradient descent algorithm. At each iteration, GGD uses an n-dimensional sphere to approximate a local neighborhood on the objective function-induced hypersurface, adapting to arbitrarily complex geometries. A tangent vector derived from the Euclidean gradient is projected onto the sphere to form a geodesic, ensuring the update trajectory stays on the hypersurface. Parameter updates are performed using the endpoint of the geodesic. The maximum step size of the gradient in GGD is equal to a quarter of the arc length on the n-dimensional sphere, thus eliminating the need for a learning rate. Experimental results show that compared with the classic Adam algorithm, GGD achieves test MSE reductions ranging from 35.79% to 48.76% for fully connected networks on the Burgers’ dataset, and cross-entropy loss reductions ranging from 3.14% to 11.59% for convolutional neural networks on the MNIST dataset.

Liked Liked