Why AI Agent Reliability Depends More on the Harness Than the Model
I keep hearing the same question at every engineering offsite, Slack thread, and investor pitch: “What’s the best model right now — GPT, Claude, or Gemini?” I spent the last several months building and debugging agent-based systems, and I think this is the wrong question entirely. The evidence is now overwhelming: what determines whether an AI agent succeeds in production is not the model underneath it, but the infrastructure wrapped around it. I am going to lay out my hypothesis, test […]