RL Chess Bot Isn’t Learning Anything Useful

digitado ⋅ 15 de January de 2026

Hey guys.

For the past couple months, I’ve been working on creating a chess bot that uses Dueling DDQN.

I initially started with pure RL training, but the agent was just learning to play garbage moves and kept hanging pieces.

So I decided to try some supervised learning before diving into RL. After training on a few million positions taken from masters’ games, the model is able to crush Stockfish Level 3 (around 1300 ELO, if I’m not mistaken).

However, when I load the weights of the SL model into my RL pipeline… everything crumbles. I’m seeing maximum Q values remain at around 2.2, gradients (before clipping) at 60 to 100, and after around 75k self-play games, the model is back to playing garbage.

I tried seeding the replay buffer with positions from masters’ games, and that seemed to help a bit at first, but it devolved into random piece shuffling yet again.

I lowered the learning rate, implemented Polyak averaging, and a whole slew of other modifications, but nothing seems to work out.

I understand that Dueling DDQN is not the best choice for chess, and that actor-critic methods would serve me much better, but I’m doing this as a learning exercise and would like to see how far I can take it.

Is there anything else I should try? Perhaps freezing the weights of the body of the neural network for a while? Or should I continue training for another 100k games and see what happens?

I’m not looking to create a superhuman agent here, just something maybe 50 to 100 ELO better than what SL provided.

Any advice would be much appreciated.

submitted by /u/GallantGargoyle25
[link] [comments]

Like 0

Liked Liked