DQN for Solving a Maze in Less than 10 minutes Training
Is it possible to train a DQN to solve a maze with non-convex obstacles in a long-horizon navigation task (in 10 minutes or less)?
The rules are:
- You can not use old data except for the replay buffer
- The inputs are only the x and y coordinates of the state and the distance of the agent to the goal
- Step size should not exceed 2% of the total maze size
- You must start from the same initial state
- The implementation has to be a DQN
- The training should take no longer than 10 minutes
I have tried Double DQN, Noisy DQN, and prioritized experience replay. I have tried different combinations of rewards (-ve reward for every step, high +ve reward for reaching the goal, high -ve reward for hitting an obstacle). I even tried making the reward in terms of the distance to the goal.
I tried different epsilon-greedy decay methods.
No matter what I did, the agent just could not learn to reach the goal.
I think the main problem is that the agent doesn’t always reach the goal during training. Sometimes, it does not reach it at all. How can I solve this?
Overall, is this problem solvable anyway? Especially given the time constraint? If so, how? Any advice please?
submitted by /u/Now200
[link] [comments]