I trained a DQN agent to solve drone intercept cost optimization — here’s what it figured out on its own

I trained a DQN agent to solve drone intercept cost optimization — here's what it figured out on its own

Built a drone interception environment from scratch in Pygame — no OpenAI Gym dependency. State vector is 10-dimensional, tracking 2 nearest drones with angle error, predicted position 15 steps ahead, distance, and vertical speed.

Reward structure is where it gets interesting:

  • Hit: +10
  • Building destroyed: -20
  • Shot fired: -0.5
  • Drone escaped: -5

The -0.5 firing penalty forces the agent to learn ammo conservation. What emerged: under low swarm density it fires aggressively, under high density it becomes selective. Past a certain swarm threshold it fails regardless — which is honestly the most interesting finding.

Trains in ~2 minutes on CPU. 150 episodes, epsilon-greedy, target network updated every 10 episodes.

Curious what reward shaping others have tried for similar problems.

submitted by /u/AfraidRub1863
[link] [comments]

Liked Liked