I Trained an AI to Beat Final Fight… Here’s What Happened

I Trained an AI to Beat Final Fight… Here’s What Happened

Hey everyone,

I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community.

The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation.

A couple of interesting challenges came up:

  • Action space remapping (MultiBinary → emulator input)
  • Trajectory alignment issues (obs/action offset bugs 😅)
  • LSTM policy behaving differently under evaluation vs manual rollout
  • Managing rollouts efficiently without loading everything into memory

The agent can already make some progress, but still struggles with consistency and survival.

I’d love to hear thoughts on:

  • Improving BC performance with limited trajectories
  • Best practices for transitioning BC → PPO
  • Handling partial observability in these environments

Here’s the code if you want to see the full process and results:
notebooks-rl/final_fight at main · paulo101977/notebooks-rl

Any feedback is very welcome!

submitted by /u/AgeOfEmpires4AOE4
[link] [comments]

Liked Liked