Curse of Dimensionality in Neural Network Optimization
arXiv:2502.05360v3 Announce Type: replace-cross Abstract: This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-frac{4r}{d-2r}}$, where $t$ denotes the time parameter of the gradient flow dynamics. This result highlights the presence of the curse of dimensionality in the optimization computation […]