Bigger Models Won’t Fix Terminal Agents

LLMs can explain terminals but fail to use them. New research shows data engineering—not bigger models—drives real gains on Terminal-Bench 2.0.

Liked Liked