Help for PPO implementation without pytorch/tf
Hey ! I’m trying to implement a very simple PPO algorithm with numpy but I’m struggling with 2 things : – It seems that the actor net is not learning and I don’t know why. – some values go to nan after some epochs. I tried to comment as well as I could to keep it simple. Thank you very much for taking the time to help me: the environnement : a little grid 2d : “”” GAME […]