DQN with Catastrophic Forgetting?
Hi everyone, happy new year! I have a project where I’m training a DQN with stuff relating to pricing and stock decisions. Unfortunaly, I seem to be running into what seems to be some kind of forgetting? When running the training on a pure random (100% exploration rate) and then just evaluating it (just being greedy) it actually reaches values better than fixed policy. The problem arises when I left it to train beyond that scope, especially after […]