What Functions Does XGBoost Learn?

arXiv:2601.05444v1 Announce Type: cross
Abstract: This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $mathcal{F}^{d, s}_{infty-text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{infty-text{XGB}}(cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $mathcal{F}^{d, s}_{infty-text{ST}}$ with penalty $V^{d, s}_{infty-text{XGB}}(cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $mathcal{F}^{d, s}_{infty-text{ST}}$ and $V^{d, s}_{infty-text{XGB}}(cdot)$ in terms of Hardy–Krause variation. We prove that the least squares estimator over ${f in mathcal{F}^{d, s}_{infty-text{ST}}: V^{d, s}_{infty-text{XGB}}(f) le V}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (log n)^{4(min(s, d) – 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.

Liked Liked