Neural network optimization strategies and the topography of the loss landscape
arXiv:2602.21276v1 Announce Type: cross Abstract: Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) – a non-convex global optimization algorithm which relies only on the gradient […]