Efficient Swap Regret Minimization in Combinatorial Bandits

arXiv:2602.02087v1 Announce Type: cross
Abstract: This paper addresses the problem of designing efficient no-swap regret algorithms for combinatorial bandits, where the number of actions $N$ is exponentially large in the dimensionality of the problem. In this setting, designing efficient no-swap regret translates to sublinear — in horizon $T$ — swap regret with polylogarithmic dependence on $N$. In contrast to the weaker notion of external regret minimization – a problem which is fairly well understood in the literature – achieving no-swap regret with a polylogarithmic dependence on $N$ has remained elusive in combinatorial bandits. Our paper resolves this challenge, by introducing a no-swap-regret learning algorithm with regret that scales polylogarithmically in $N$ and is tight for the class of combinatorial bandits. To ground our results, we also demonstrate how to implement the proposed algorithm efficiently — that is, with a per-iteration complexity that also scales polylogarithmically in $N$ — across a wide range of well-studied applications.

Liked Liked