[P] Flight Recorder – Replay debugger for AI agents (skip successful steps, re-run only failures)

I spent way too much time last week debugging a multi-agent workflow. Every time I fixed a bug, I had to re-run the entire pipeline – burning LLM credits and waiting for the same API calls to execute again.

So I built Flight Recorder – a replay debugger for AI agents.

**What it does:**

– Records every agent call (inputs, outputs, errors)

– Shows you the root cause when something fails

– Lets you replay from the exact failure point

– Caches everything that already worked

**Example:**

from flight_recorder import FlightRecorder

fr = FlightRecorder()

u/fr.register(“crm_lookup”)

def crm_lookup(email):

return db.query(email) # Slow API

u/fr.register(“scorer”)

def score_lead(contact):

assert contact is not None # BUG

return contact[“score”]

fr.run(pipeline, “unknown@example.com“)

# Crashes at scorer

$ flight-recorder debug last

Root cause: crm_lookup returned None

# Fix the bug, then:

$ flight-recorder replay last

# crm_lookup is cached (doesn’t re-run)

# scorer re-runs with your fix

# SUCCESS

**Real example:** 5-agent pipeline with GPT-4o-mini

– Without FR: Re-run everything = 90 seconds

– With FR: Replay from failure = 5 seconds

It’s open source, MIT licensed, works with LangChain/CrewAI/custom agents.

GitHub: https://github.com/whitepaper27/Flight-Recorder

PyPI: `pip install flight-recorder`

Demo GIF: https://github.com/whitepaper27/Flight-Recorder/blob/main/demo.gif

I’d love feedback, especially if you’ve debugged complex agent workflows before!

submitted by /u/coolsoftcoin
[link] [comments]

Liked Liked