Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?
arXiv:2507.11891v2 Announce Type: replace
Abstract: We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice.