Some more thoughts on debugging RL implementations
Hi! Recently, I have tried to implemented a number of RL algorithms such as PPO for Mujoco and reduced versions of DQN for Pong and MuZero (only for CartPole…) and I wanted to share some impressions from debugging these implementations. Many points have already been written up in other posts (see some links below), so I’ll focus on what I found most important. Approach I found it best to implement the related simpler version of your algorithm first […]