Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values
arXiv:2603.00945v2 Announce Type: replace-cross Abstract: We study non-rectangular robust Markov decision processes under the average-reward criterion, where the ambiguity set couples transition probabilities across states and the adversary commits to a stationary kernel for the entire horizon. We show that any history-dependent policy achieving sublinear expected regret uniformly over the ambiguity set is robust-optimal, and that the robust value admits a minimax representation as the infimum over the ambiguity set of the classical optimal gains, without requiring any […]