Robust Learning of a Group DRO Neuron
We study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ”best-fit” neuron parameterized by $mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $mathcal p_{[1]},dots,mathcal p_{[K]}$, we seek to approximate $mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $boldsymbolλ in Δ_K$, where the objective is $sum_{i in [K]}λ_{[i]},mathbb E_{(mathbf x,y)simmathcal p_{[i]}}(σ(mathbf wcdotmathbf x)-y)^2 – νd_f(boldsymbolλ,frac{1}{K}mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $νgeq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $widehat{mathbf w}$ that is constant-factor competitive with $mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.