I made a Mario RL trainer with a live dashboard – would appreciate feedback
I’ve been experimenting with reinforcement learning and built a small project that trains a PPO agent to play Super Mario Bros locally. Mostly did it to better understand SB3 and training dynamics instead of just running example notebooks.
It uses a Gym-compatible NES environment + Stable-Baselines3 (PPO). I added a simple FastAPI server that streams frames to a browser UI so I can watch the agent during training instead of only checking TensorBoard.
What I’ve been focusing on:
- Frame preprocessing and action space constraints
- Reward shaping (forward progress vs survival bias)
- Stability over longer runs
- Checkpointing and resume logic
Right now the agent learns basic forward movement and obstacle handling reliably, but consistency across full levels is still noisy depending on seeds and hyperparameters.
If anyone here has experience with:
- PPO tuning in sparse-ish reward environments
- Curriculum learning for multi-level games
- Better logging / evaluation loops for SB3
I’d appreciate concrete suggestions. Happy to add a partner to the project
Repo: https://github.com/mgelsinger/mario-ai-trainer
I’m also curious about setting up something like llama to be the agent that helps another agent figure out what to do and cut down on training speed significantly. If anyone is familiar, please reach out.
submitted by /u/pleasestopbreaking
[link] [comments]