I’m training an AI to drive Indianapolis 500 in DOSBox using reinforcement learning
Hey everyone,
I’ve been working on a reinforcement learning project for the old DOS game **Indianapolis 500**, running through DOSBox. The goal is to train an AI driver that can learn to leave the pit area, stay on track, complete laps, recover from mistakes, and eventually race faster than my own human driving.
Video here:
Indianapolis 500 Game – AI training
The setup uses a mix of:
– **Pixel input** from the DOSBox window
– **Keyboard control** for throttle, brake, left, right, etc.
– **Game-memory telemetry** read directly from DOSBox memory
– **Behavior cloning** from my own recorded driving
– **Recurrent PPO**
– A custom **Transformer + LSTM PPO policy**
– A live reward dashboard so I can see what the agent is being rewarded or punished for
The telemetry currently includes things like:
“`text
speed
position/progress around the track
lap completion
wrong direction detection
wall contact / crash detection
damage / hard crash signals
“`
Lap detection is not done with OCR. Instead, the program watches a memory value that represents track position. When that value wraps from a high value back to a low value, and then confirms past a threshold near the start/finish area, it counts a completed lap. That made lap rewards much more reliable than trying to infer it from pixels.
The reward system currently gives positive reward for:
“`text
speed
forward progress
staying on track
finishing laps
finishing laps quickly
“`
And penalties for:
“`text
going off track
wall contact
wrong direction
heavy crashes
sitting under 10 mph for too long
“`
I also recorded around 17 human-driven laps and trained a behavior cloning model from that. It helped the agent learn the basic shape of the track, but it also showed an interesting problem: if I overweight rare actions like steering right, the model starts turning right too much and crashes. So now I’m moving more toward PPO fine-tuning, where the agent can improve from telemetry rewards instead of just copying my driving.
The current next step is training the Transformer+LSTM PPO agent longer, with resets on heavy crashes and long dormancy, so it learns that “crash and sit still” is a dead end.
It’s still very experimental, but it’s been really fun seeing an old racing sim become a reinforcement learning environment. Any feedback on reward design, recurrent PPO setup, or better ways to combine behavior cloning with PPO would be very welcome.
submitted by /u/Few-Night-4811
[link] [comments]