Multi-Agent Deep Reinforcement Learning with Contrastive Policy Diversification and Hierarchical Graph Networks for Urban Traffic Signal Control
Multi-Agent Reinforcement Learning (MARL) provides an effective approach for ur-ban multi-intersection traffic signal control. However, existing methods have faced two fundamental challenges, policy homogenization and inefficient credit assignment. The former led to convergent agent policies that failed to adapt to heterogeneous traf-fic patterns, while the latter prevented agents from accurately evaluating their indi-vidual contributions to system performance. To address these issues, this paper pro-poses a Multi-Agent Hierarchical Contrastive Learning Traffic Signal Control (MAHCL-TSC) model. The model incorporates an unsupervised contrastive learning module that enhances the discriminative power of state representations, thereby alle-viating policy homogenization. Additionally, it designs a hierarchical graph convolu-tional credit allocation network that leverages road network topology and functional characteristics to enable structure-aware collaborative value estimation, significantly improving the precision of credit assignment. Based on these components, a Contras-tive QTRAN with Hierarchical Graph Convolution (CQTRAN-HGC) algorithm is pro-posed, which jointly optimizes contrastive learning loss and QTRAN constraint loss. Experiments conducted in the SUMO simulation environment on 4×4 and 6×6 grid networks demonstrate that our model outperforms mainstream baseline methods such as QTRAN, MADDPG, and MAPPO in key metrics including average queue length, waiting time, and intersection pressure, validating its effectiveness in improving con-trol efficiency and generalization capability.