ABC-Bench and the Real Test for AI Engineers: Can It Run End-to-End?
ABC-Bench evaluates agentic coding on 224 tasks across real OSS backends using containerized dependencies and external end-to-end API tests
Like
0
Liked
Liked
ABC-Bench evaluates agentic coding on 224 tasks across real OSS backends using containerized dependencies and external end-to-end API tests