SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures
arXiv:2505.09572v3 Announce Type: replace-cross Abstract: We study gradient flows for loss landscapes of fully connected feedforward neural networks with commonly used continuously differentiable activation functions such as the logistic, hyperbolic tangent, softplus or GELU function. We prove that the gradient flow either converges to a critical point or diverges to infinity while the loss converges to an asymptotic critical value. Moreover, we prove the existence of a threshold $varepsilon>0$ such that the loss value of any gradient flow […]