[P] Flight Recorder – Replay debugger for AI agents (skip successful steps, re-run only failures)
I spent way too much time last week debugging a multi-agent workflow. Every time I fixed a bug, I had to re-run the entire pipeline – burning LLM credits and waiting for the same API calls to execute again.
So I built Flight Recorder – a replay debugger for AI agents.
**What it does:**
– Records every agent call (inputs, outputs, errors)
– Shows you the root cause when something fails
– Lets you replay from the exact failure point
– Caches everything that already worked
**Example:**
from flight_recorder import FlightRecorder
fr = FlightRecorder()
u/fr.register(“crm_lookup”)
def crm_lookup(email):
return db.query(email) # Slow API
u/fr.register(“scorer”)
def score_lead(contact):
assert contact is not None # BUG
return contact[“score”]
fr.run(pipeline, “unknown@example.com“)
# Crashes at scorer
$ flight-recorder debug last
Root cause: crm_lookup returned None
# Fix the bug, then:
$ flight-recorder replay last
# crm_lookup is cached (doesn’t re-run)
# scorer re-runs with your fix
# SUCCESS
**Real example:** 5-agent pipeline with GPT-4o-mini
– Without FR: Re-run everything = 90 seconds
– With FR: Replay from failure = 5 seconds
It’s open source, MIT licensed, works with LangChain/CrewAI/custom agents.
GitHub: https://github.com/whitepaper27/Flight-Recorder
PyPI: `pip install flight-recorder`
Demo GIF: https://github.com/whitepaper27/Flight-Recorder/blob/main/demo.gif
I’d love feedback, especially if you’ve debugged complex agent workflows before!
submitted by /u/coolsoftcoin
[link] [comments]