Hidden Minima in Two-Layer ReLU Networks
arXiv:2312.16819v4 Announce Type: replace-cross Abstract: We consider the optimization problem associated with training two-layer ReLU networks with (d) inputs under the squared loss, where the labels are generated by a target network. Recent work has identified two distinct classes of infinite families of minima: one whose training loss vanishes in the high-dimensional limit, and another whose loss remains bounded away from zero. The latter family is empirically avoided by stochastic gradient descent, hence emph{hidden}, motivating the search for […]