Harness Engineering for Language Agents: The Harness Layer as Control, Agency, and Runtime

Language agents that act through tools, files, browsers, APIs, and persistent sessions are shaped by more than the base model or a single prompt. Their reliability depends on a harness layer that determines which instructions remain authoritative, what actions are available, how state is carried forward, and how failures are handled over time. This paper argues that this layer warrants explicit treatment in NLP. We propose and operationalize a working decomposition of the harness layer as control, agency, and runtime (CAR); situate harness engineering in the arc from software engineering through prompt and context engineering; and audit 63 harness-relevant works, suggesting a meaningful visibility gap between academic papers and public engineering notes. We further argue that many reported agent gains may be partly harness-sensitive rather than purely model-driven, and propose HarnessCard as a lightweight reporting artifact. Grounded in papers, benchmarks, protocols, and engineering notes collected through April 21, 2026, we argue that progress in language agents should report not only the model, but also the harness layer that turns capability into governed action.

Liked Liked