Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects
arXiv:2602.17734v1 Announce Type: new
Abstract: Agile estimation techniques, particularly T-shirt sizing, are widely used in software development for their simplicity and utility in scoping work. However, when we apply these methods to artificial intelligence initiatives — especially those involving large language models (LLMs) and multi-agent systems — the results can be systematically misleading. This paper shares an evidence-backed analysis of five foundational assumptions we often make during T-shirt sizing. While these assumptions usually hold true for traditional software, they tend to fail in AI contexts: (1) linear effort scaling, (2) repeatability from prior experience, (3) effort-duration fungibility, (4) task decomposability, and (5) deterministic completion criteria. Drawing on recent research into multi-agent system failures, scaling principles, and the inherent unreliability of multi-turn conversations, we show how AI development breaks these rules. We see this through non-linear performance jumps, complex interaction surfaces, and “tight coupling” where a small change in data cascades through the entire stack. To help teams navigate this, we propose Checkpoint Sizing: a more human-centric, iterative approach that uses explicit decision gates where scope and feasibility are reassessed based on what we learn during development, rather than what we assumed at the start. This paper is intended for engineering managers, technical leads, and product owners responsible for planning and delivering AI initiatives.