[D] Are there REAL success stories of autonomous AI dev agents working reliably in production?

I’m having a serious debate with a colleague, and I want to settle this with actual evidence instead of opinions.

The claim:

That it’s possible today to run orchestrated AI developer agents (multiple agents, coordinated workflows) that can autonomously build and maintain software — under supervision of a senior AI/dev — without running into unfixable errors or constant breakdowns.

I’m skeptical. He believes it’s already happening.

So I’m looking for real-world examples, not theory:

– Have you actually used autonomous dev agents in production?

– What was the setup? (tools, stack, orchestration method)

– What level of autonomy are we talking about?

– What still breaks?

– Did it scale beyond small experiments or toy projects?

Especially interested in:

– Multi-agent setups (not just Copilot-style assistance)

– Systems that run for extended periods (not one-off demos)

– Cases where human input is minimal but still controlled

If you’ve seen this work (or fail), I’d really appreciate detailed insights.

Trying to separate hype from reality here.

submitted by /u/MegaMillyMansion
[link] [comments]

Liked Liked