Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability

arXiv:2512.20368v2 Announce Type: replace
Abstract: Statistical inference in contextual bandits is challenging due to the adaptive, non-i.i.d. nature of the data. A growing body of work shows that classical least-squares inference can fail under adaptive sampling, and that valid confidence intervals for linear functionals typically require an inflation of order $sqrt{d log T}$. This phenomenon — often termed the price of adaptivity — reflects the intrinsic difficulty of reliable inference under general contextual bandit policies. A key structural condition that overcomes this limitation is the stability condition of Lai and Wei, which requires the empirical feature covariance to converge to a deterministic limit. When stability holds, the ordinary least-squares estimator satisfies a central limit theorem, and classical Wald-type confidence intervals remain asymptotically valid under adaptation, without incurring the $sqrt{d log T}$ price of adaptivity.
In this paper, we propose and analyze a regularized EXP4 algorithm for linear contextual bandits. Our first main result shows that this procedure satisfies the Lai–Wei stability condition and therefore admits valid Wald-type confidence intervals for linear functionals. We additionally provide quantitative rates of convergence in the associated central limit theorem. Our second result establishes that the same algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors, demonstrating that stability and statistical efficiency can coexist within a single contextual bandit method. As an application of our theory, we show how it can be used to construct confidence intervals for the conditional average treatment effect (CATE) under adaptively collected data. Finally, we complement our theory with simulations illustrating the empirical normality of the resulting estimators and the sharpness of the corresponding confidence intervals.

Liked Liked