StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
arXiv:2604.08620v1 Announce Type: new Abstract: Reinforcement learning is typically treated as a uniform, data-driven optimization process, where updates are guided by rewards and temporal-difference errors without explicitly exploiting global structure. In contrast, dynamic programming methods rely on structured information propagation, enabling efficient and stable learning. In this paper, we provide evidence that such structure can be recovered from the learning dynamics of distributional reinforcement learning. By analyzing the temporal evolution of return distributions, we identify signals that capture […]