SWE-bench February 2026 leaderboard update
SWE-bench February 2026 leaderboard update SWE-bench is one of the benchmarks that the labs love to list in their model releases. The official leaderboard is infrequently updated but they just did a full run of it against the current generation of models, which is notable because it’s always good to see benchmark results like this that weren’t self-reported by the labs. The fresh results are for their “Bash Only” benchmark, which runs their mini-swe-bench agent (~9,000 lines of […]