Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning
Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise: when agents share a common reward, the actions of all $N$ agents jointly determine each agent’s learning signal, so cross-agent noise grows with $N$. In the policy gradient setting, per-agent gradient estimate variance scales as $Θ(N)$, yielding sample complexity $mathcal{O}(N/ε)$. We observe that many domains — cloud computing, transportation, power systems — have differentiable analytical models that prescribe efficient system states. In this work, we propose […]