Can’t train a pixel-based PPO for Hopper environment

Hi everyone. This is my first question in Reddit, so I do not know if this the place to publish it.

I have been trying to train a PPO model to make a Hopper agent “walk”. I have implemented my own version of the PPO algorithm, so that I can modify the architecture more easily.

I have done already a huge hyperparameter search (manually done), changed the reward function to an easier and also more complex one, chatted with claude, gemini and chatgpt about it, and neither managed to help me the way I wanted. I have also tried to train ir longer, but at certain point it seems like it reaches a plateau and does not improve anymore.

I am also struggling to find online resources about this exact combination of algorithm and environment.

The best I could get were two consecutive steps.

If anyone had some tips about what could work for this task, I would really appreciate it!!

submitted by /u/skroll18
[link] [comments]

Liked Liked