Double DQN shows self-correcting loss spikes in chess self-play — normal behavior or architecture issue?
I’ve been working on training a Double DQN chess agent using self-play, while comparing it against DQN and SARSA. During training, I saw a big loss spike around the middle, close to 192, but by the end it recovered and went down to about 0.7. I thought that was interesting because it might show the agent struggling for a while before stabilizing. Setup: For a fair comparison, I used the same network architecture as the DQN model: Linear(66→256) […]