Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification

arXiv:2505.11985v3 Announce Type: replace-cross
Abstract: This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) misallocation minimization setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget best arm identification setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the misallocation minimization (MM) and show that its upper bound on misallocation for bounded rewards evolves as $mathcal{O}left(log{n}right)$ where $n$ is the horizon. By deriving the lower bound on the misallocation, we show that UCB-VV is order optimal. For the fixed budget best arm identification (BAI) setting we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as $expleft(-frac{n}{log(K) H}right)$, where $H$ represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance and standard deviation. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that UCB-VV consistently outperforms $epsilon$-greedy across different sub-optimality gaps though it is surpassed by VTS, which exhibits the lowest misallocation, albeit lacking in theoretical guarantees. We also illustrate the superior performance of SHVV, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the UCB-VV and SHVV in call option trading on $100$ stocks generated using GBM.

Liked Liked