RL Chess Bot Isn’t Learning Anything Useful
Hey guys.
For the past couple months, I’ve been working on creating a chess bot that uses Dueling DDQN.
I initially started with pure RL training, but the agent was just learning to play garbage moves and kept hanging pieces.
So I decided to try some supervised learning before diving into RL. After training on a few million positions taken from masters’ games, the model is able to crush Stockfish Level 3 (around 1300 ELO, if I’m not mistaken).
However, when I load the weights of the SL model into my RL pipeline… everything crumbles. I’m seeing maximum Q values remain at around 2.2, gradients (before clipping) at 60 to 100, and after around 75k self-play games, the model is back to playing garbage.
I tried seeding the replay buffer with positions from masters’ games, and that seemed to help a bit at first, but it devolved into random piece shuffling yet again.
I lowered the learning rate, implemented Polyak averaging, and a whole slew of other modifications, but nothing seems to work out.
I understand that Dueling DDQN is not the best choice for chess, and that actor-critic methods would serve me much better, but I’m doing this as a learning exercise and would like to see how far I can take it.
Is there anything else I should try? Perhaps freezing the weights of the body of the neural network for a while? Or should I continue training for another 100k games and see what happens?
I’m not looking to create a superhuman agent here, just something maybe 50 to 100 ELO better than what SL provided.
Any advice would be much appreciated.
submitted by /u/GallantGargoyle25
[link] [comments]