DQN Maze Solver Converging to Horrible Policy
I am teaching a robot how to “solve” a maze using DQN. For weeks now it has been converging to possibly the worst policy it possibly could which is to drive backwards into a wall no matter what and accrue enormous negative rewards. I have modulated an enormous amount of variables, hyper-parameters, changed neural network size, drastically altered reward structure in various ways, tried different state inputs, tons of initial exploration, given it memory, made the optimal policy […]