A Gap Between Decision Trees and Neural Networks

arXiv:2601.03919v2 Announce Type: replace-cross
Abstract: We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation ($mathrm{R}mathrm{TV}$) seminorm, which controls the geometric complexity of level sets.
We first show that the hard tree indicator $1_A$ has infinite $mathrm{R}mathrm{TV}$. Moreover, two natural split-wise continuous surrogates–piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing–also have infinite $mathrm{R}mathrm{TV}$ in dimensions $d>1$, while Gaussian convolution yields finite $mathrm{R}mathrm{TV}$ but with an explicit exponential dependence on $d$.
We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to $1_A$). For classification, we construct a smooth barrier score $S_A$ with finite $mathrm{R}mathrm{TV}$ whose fixed threshold $tau=1$ exactly recovers the box. Under a mild tube-mass condition near $partial A$, we prove an $L_1(P)$ calibration bound that decays polynomially in a sharpness parameter, along with an explicit $mathrm{R}mathrm{TV}$ upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy–complexity tradeoff and how threshold selection shifts where training lands along it.

Liked Liked