PPO rewards start crashing after some point on training

Hi, I was trying to implement PPO with Pytorch to solve Pendulum-v1 enviroment. There’s no problem at beginning of the train but after some point, rewards start crashing. I tried to figure out why its crashing. But I still haven’t figured it out. The repo I’m working on right now there’s only basic things like model implementation, training and utils. Can someone please help me if they know why this is happening?

Repo link: https://github.com/Gradient-Descent-is-Awesome/RL-Testing

submitted by /u/YahudiKundakcisi
[link] [comments]

Liked Liked