Is convergence always dependent on initial exploration?
I’m new to RL and have been attempting to teach a simulated robot how to travel through randomly generated mazes using DQN. Sometimes when I run my program it quickly diverges into a terrible policy where it just slams into walls unintelligently, but maybe 1/3 of the time it actually learns a pretty decent policy. I’m not changing the code at all. Simply rerunning it and obtaining drastically different behavior.
My question is this:
Is this unreliability an inherent aspect of DQN, or is there something flawed with my code / reward structure that is likely causing this inconsistent training behavior?
submitted by /u/aidan_adawg
[link] [comments]
Like
0
Liked
Liked