Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

arXiv:2603.13742v1 Announce Type: cross
Abstract: We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $widetilde{O}(sqrt{KT})$ is achievable with $O(log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(loglog T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Omega(K/W)$ batches to achieve near-minimax regret $widetilde{O}(sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity.
Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Omega(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(log T)$ bits of memory and $widetilde{O}(K)$ batches that achieves regret $widetilde{O}(sqrt{KT})$, which nearly matches our lower bound.

Liked Liked