Loss explodes when going from single agent to multi agent? (ParallEnv+action masking in PettingZoo)

digitado ⋅ 20 de March de 2026

I decided to do multi agent RL for my bachelor thesis and I created a multi agent enviroment in which I want to benchmark multiple algorithms. I’ve been using the SB3 PPO implementation and it works well enough when I only have one agent, but once I have more than one (even just two) the training completely breaks down. The loss jumps all over the place (from 5, to 10, to 300, 2000, …) and I don’t really know why.

I’m using action masking and the ParallelEnv API of PettingZoo, but unfortunately I haven’t found any tutorials on how to use the SB3 library with parallel+action masking :/ There’s one for AEC (https://pettingzoo.farama.org/tutorials/sb3/connect_four/) so I converted my enviroment to an AEC one, but like I said, it seems like it’s not working perfectly (or i’m just doing something really wrong)

The link to my enviroment repo is https://github.com/mecubey/BachelorThesis-Code

You can find an explanation of the enviroment as well as the code (I tried my best to document it well) on there.

Would really appreciate some pointers & advice 🙂

submitted by /u/testaccountthrow1
[link] [comments]

Like 0

Liked Liked