Finished RL toybox repo: 6 small visual environments covering Q-learning, DQN, PPO, SAC, MCTS and multi-agent RL

digitado ⋅ 20 de May de 2026

Hey!

A few months ago I posted here about a small RL toy games repo I had started playing with.

At the time it was basically Snake + a couple of experiments, with a few things still half-working. I kept going with it and it has now turned into something a bit more complete:

https://github.com/bzznrc/rl-toybox

Green player is RL, the other ones follow a scripted logic

The idea is to land a compact toybox: small arcade-style environments, each meant to show (and for me to learn) a different family of RL methods in a way that is easy to inspect, run, and modify.

Current lineup:

Snake — value methods / Q-learning-style control
Bang — DQN-style discrete arena control
Jump — PPO / on-policy actor-critic
Vroom — SAC / continuous control
Flip — MCTS + self-play
Kick — multi-agent RL / CTDE with a shared policy

Most of the games are now roughly where I wanted them to be, with a couple of exceptions (Vroom does not seem to train past level 4 out of 5 in my curriculum, and the way the agents play together in Kick is… very debatable).

Would be very happy if anyone wants to have a look, and give feedback on the env design, observations/actions/rewards, and repo clarity.

Hope you like it!

submitted by /u/ScazzaMage
[link] [comments]

Like 0

Liked Liked