I trained a DQN agent to solve drone intercept cost optimization — here’s what it figured out on its own
|
Built a drone interception environment from scratch in Pygame — no OpenAI Gym dependency. State vector is 10-dimensional, tracking 2 nearest drones with angle error, predicted position 15 steps ahead, distance, and vertical speed. Reward structure is where it gets interesting:
The -0.5 firing penalty forces the agent to learn ammo conservation. What emerged: under low swarm density it fires aggressively, under high density it becomes selective. Past a certain swarm threshold it fails regardless — which is honestly the most interesting finding. Trains in ~2 minutes on CPU. 150 episodes, epsilon-greedy, target network updated every 10 episodes. Curious what reward shaping others have tried for similar problems. submitted by /u/AfraidRub1863 |