DQN for Solving a Maze in Less than 10 minutes Training

Is it possible to train a DQN to solve a maze with non-convex obstacles in a long-horizon navigation task (in 10 minutes or less)?

The rules are:

  • You can not use old data except for the replay buffer
  • The inputs are only the x and y coordinates of the state and the distance of the agent to the goal
  • Step size should not exceed 2% of the total maze size
  • You must start from the same initial state
  • The implementation has to be a DQN
  • The training should take no longer than 10 minutes

I have tried Double DQN, Noisy DQN, and prioritized experience replay. I have tried different combinations of rewards (-ve reward for every step, high +ve reward for reaching the goal, high -ve reward for hitting an obstacle). I even tried making the reward in terms of the distance to the goal.

I tried different epsilon-greedy decay methods.

No matter what I did, the agent just could not learn to reach the goal.

I think the main problem is that the agent doesn’t always reach the goal during training. Sometimes, it does not reach it at all. How can I solve this?

Overall, is this problem solvable anyway? Especially given the time constraint? If so, how? Any advice please?

submitted by /u/Now200
[link] [comments]

Liked Liked