I made a Mario RL trainer with a live dashboard – would appreciate feedback

digitado ⋅ 21 de February de 2026

I’ve been experimenting with reinforcement learning and built a small project that trains a PPO agent to play Super Mario Bros locally. Mostly did it to better understand SB3 and training dynamics instead of just running example notebooks.

It uses a Gym-compatible NES environment + Stable-Baselines3 (PPO). I added a simple FastAPI server that streams frames to a browser UI so I can watch the agent during training instead of only checking TensorBoard.

What I’ve been focusing on:

Frame preprocessing and action space constraints
Reward shaping (forward progress vs survival bias)
Stability over longer runs
Checkpointing and resume logic

Right now the agent learns basic forward movement and obstacle handling reliably, but consistency across full levels is still noisy depending on seeds and hyperparameters.

If anyone here has experience with:

PPO tuning in sparse-ish reward environments
Curriculum learning for multi-level games
Better logging / evaluation loops for SB3

I’d appreciate concrete suggestions. Happy to add a partner to the project

Repo: https://github.com/mgelsinger/mario-ai-trainer

I’m also curious about setting up something like llama to be the agent that helps another agent figure out what to do and cut down on training speed significantly. If anyone is familiar, please reach out.

submitted by /u/pleasestopbreaking
[link] [comments]

Like 0

Liked Liked