Runtime Burden Allocation for Structured LLM Routing in Agentic Expert Systems: A Full-Factorial Cross-Backend Methodology
arXiv:2604.01235v1 Announce Type: new Abstract: Structured LLM routing is often treated as a prompt-engineering problem. We argue that it is, more fundamentally, a systems-level burden-allocation problem. As large language models (LLMs) become core control components in agentic AI systems, reliable structured routing must balance correctness, latency, and implementation cost under real deployment constraints. We show that this balance is shaped not only by prompts or schemas, but also by how structural work is allocated across the generation stack: […]