Struggling with RL hyperparameter tuning + reward shaping for an Asteroids-style game – what’s enough and what’s overkill?

Hey all,

I’m building an RL agent to play an Asteroids-style arcade game that I made.

I can get decent models working now, and I’ve definitely improved compared to the first RL version I ever built. The agent survives way longer than it did in the beginning, and by watching it play after training I can actually make some decisions about what seems to be helping or hurting. So it’s not totally random guessing anymore, but I still feel like I’m fumbling around more than I should.

I’m still manually trying different hyperparameters like learning rate, gamma, clipping, etc., and it takes a lot of time. I also don’t fully understand all the training graphs and action percentage plots, so I’m not always confident in why something improved or got worse.

While reading, I came across things like population based tuning with Ray Tune, Bayesian optimization, and other auto-tuning methods, but I honestly have no idea what’s actually reasonable for a project like this and what’s just complete overkill.

I’m also struggling a lot with reward shaping. I’ve been experimenting with rewards for survival time, shooting asteroids, staying out of danger, penalties, and so on, but I feel like I’m just adding reward terms without really knowing which ones are meaningful and which ones are just noise.

I’d really like to understand how people think about this instead of just trial and error. If anyone here has worked on RL for arcade-style games or similar environments, I’d love some advice on how you approached hyperparameter tuning and how you figured out a solid reward setup.

Also happy to check out any videos, articles, or resources that helped you understand this stuff better.

Thanks a lot

submitted by /u/GSevenStars
[link] [comments]

Liked Liked