Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

arXiv:2510.20755v3 Announce Type: replace-cross
Abstract: U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high computational cost – addressed partly through incomplete U-statistics – and their non-standard asymptotic behavior in the degenerate case, which typically requires resampling methods for hypothesis testing. This paper presents a novel perspective on U-statistics, grounded in hypergraph theory and combinatorial designs. Our approach bypasses the traditional Hoeffding decomposition, the main analytical tool in this literature but one that is highly sensitive to degeneracy. By characterizing the dependence structure of a U-statistic, we derive a Berry-Esseen bound valid for incomplete U-statistics of deterministic designs, yielding conditions under which Gaussian limiting distributions can be established even in degenerate cases and when the order diverges. We also introduce efficient algorithms to construct incomplete U-statistics based on equireplicate designs, a subclass of deterministic designs that, in certain cases, achieve minimum variance. Beyond its theoretical contributions, our framework provides a systematic way to construct permutation-free counterparts to tests based on degenerate U-statistics, as demonstrated in experiments with kernel-based tests using the Maximum Mean Discrepancy and the Hilbert-Schmidt Independence Criterion.

Liked Liked