Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles
arXiv:2601.05279v1 Announce Type: new
Abstract: Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computational cost of PSRO has become a significant limitation to its practical application. Our analysis shows that game simulation is the primary bottleneck in PSRO’s runtime. To address this issue, we conclude the concept of Simulation-Free PSRO and summarize existing methods that instantiate this concept. Additionally, we propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO. The number of strategies in the strategy window is limited, thereby simplifying opponent strategy selection and improving the robustness of the best response. Moreover, we use Nash Clustering to select the strategy to be eliminated, ensuring that the number of strategies within the strategy window is effectively limited. Our experiments across various environments demonstrate that the Dynamic Window mechanism significantly reduces exploitability compared to existing methods, while also exhibiting excellent compatibility. Our code is available at https://github.com/enochliu98/SF-PSRO.