Is convergence always dependent on initial exploration?

digitado ⋅ 1 de April de 2026

I’m new to RL and have been attempting to teach a simulated robot how to travel through randomly generated mazes using DQN. Sometimes when I run my program it quickly diverges into a terrible policy where it just slams into walls unintelligently, but maybe 1/3 of the time it actually learns a pretty decent policy. I’m not changing the code at all. Simply rerunning it and obtaining drastically different behavior.

My question is this:

Is this unreliability an inherent aspect of DQN, or is there something flawed with my code / reward structure that is likely causing this inconsistent training behavior?

submitted by /u/aidan_adawg
[link] [comments]

Like 0

Liked Liked