Trainer For MARL That Fits With PettingZoo

digitado ⋅ 27 de May de 2026

After 9 months of work I finally got my first successful run in a simple RL environment where the agent learns to find a target 🎉

I’m still validating more SARL scenarios, but I’m now thinking ahead toward MARL and wanted some advice on architecture and trainer choice.

Current RL engine structure:

1. SimulationEngine • Handles both logic and physics orchestration • Calls the other layers internally 2. EnvironmentEngine • Handles environment logic 3. BulletWorld • Builds and manages the PyBullet world

I also have a Gymnasium wrapper:

env = GymWrapper(simulation_engine)

which exposes clean reset() and step() APIs for SB3.

The thing is: internally SimulationEngine already works with dictionary-based outputs:

{

“agent_1”: observation,

“agent_2”: observation

}

For SARL + Gymnasium I transform this into something meaningful for SB3.

But from what I understand, PettingZoo naturally expects agent-keyed dictionaries, which makes me think my current architecture could fit MARL pretty neatly without major redesign.

My main concern is the trainer side.

SB3 + Gymnasium has been incredibly straightforward and I already have experience with it.

But for:

PettingZoo + ???

I’m stuck.

Initially I was considering RLlib because it seems to be the common answer, but I honestly don’t have the time/energy for a steep learning curve if there are cleaner alternatives.

I’m mainly interested in MAPPO and similar MARL algorithms.

Questions:

• What trainer stack are people using with PettingZoo nowadays? • RLlib vs BenchMARL vs AgileRL vs something else? • If you were building this from scratch today, what would you choose?

Any suggestions or experiences would be really appreciated.

submitted by /u/Public-Journalist820
[link] [comments]

Like 0

Liked Liked