Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^{pi}$ Realizability for Deterministic Dynamics
arXiv:2603.00716v1 Announce Type: cross Abstract: We study computationally and statistically efficient reinforcement learning under the linear $Q^{pi}$ realizability assumption, where any policy’s $Q$-function is linear in a given state-action feature representation. Prior methods in this setting are either computationally intractable, or require (local) access to a simulator. In this paper, we propose a computationally efficient online RL algorithm, named Frozen Policy Iteration, under the linear $Q^{pi}$ realizability setting that works for Markov Decision Processes (MDPs) with stochastic initial […]