Why the AI Agent Utilization Gap Is an Infrastructural Problem, Not a Managerial One

Companies are building 50,000 AI agents, but they’re only running 5,000. For some experts, that ratio reveals an important lack of trust.

Enterprises across industries have begun building their own AI agents to streamline workflows and free up resources for more human-centric operations.

The problem is that many of these enterprises can’t trust their agents at scale because they lack a robust AI accountability layer.

Without mechanisms such as confidence scoring, error traceability, human-in-the-loop intervention points, and governance infrastructure that lets operators know which agents are production-ready before deployment, enterprises have little choice but to bottleneck their own operations.

According to Antonio Bustamante, a 4x founder and CEO of the enterprise-scale AI infrastructure company bem.ai, this barrier is what’s keeping enterprises from profiting off the efficiency gains AI agents are meant to provide.

The Problem With Most Multi-Agent Systems

Many enterprises are increasingly relying on multi-agent systems to automate their more tedious tasks. Still, it’s becoming increasingly clear that these systems create as much work as they alleviate, if not more so. For example, some companies require 20 people to review their AI agents’ outputs, severely limiting the efficiency those agents may have generated.

This pattern is what has caused enterprises that have built upwards of 50,000 AI agents to run only 5,000 daily. The 10:1 ratio between creation and utilization serves as evidence of the experimentation-to-production gap. Similarly, there have been instances in which AI agents joined a company’s online communication channels, only for the team working with them to go silent because they lacked a framework for their interactions.

As such, it can be assumed that enterprises know the issue exists, but what they may not know is how to address it properly. Antonio believes the solution lies in improving AI infrastructure.

The Importance of Confidence Scoring

Among the different ways enterprises could make their AI agents more trustworthy, confidence scoring may be one of the better options.

Without knowing how uncertain an agent is about its outputs, operators can’t know which agents to trust in production. Alternatively, confidence scoring could give agents the infrastructure they need to accurately evaluate their findings, minimizing the need for operators to do the same. When an agent can provide a detailed explanation of how and why it’s uncertain, operators can feel more confident knowing the tools they’re using can keep themselves accountable.

Antonio compares confidence scoring and other forms of accountability infrastructure to assembly lines, noting that assembly lines worked because Ford standardized on inputs first. Since agent orchestration often fails with non-standardized, messy enterprise data, enterprises need to prioritize accountability before worrying about efficiency.

Subverting the Risks of Liability

When agents fail in industries like insurance, finance, and healthcare, they can easily cause serious liability issues that put a business’s reputation and bottom line at risk.

The solution, then, would be to create AI agents that don’t fail. Antonio describes bem as one such solution, calling it “the agent orchestration platform for things that can’t fail.” In practice, bem provides what any other production-ready agent infrastructure offers: systems that are flexible when needed, rigid in outcomes, and traceable when something goes wrong.

It’s become apparent by this point that AI agents have potential and that one of their greatest barriers to growth is a lack of reliable infrastructure. With the right accountability systems, particularly confidence scoring, agents can be deployed so that enterprises no longer have to allocate so many resources to fact-checking their fact-checking tools. Instead, operators can focus on what they do best, stepping in to manage their agents only when necessary.

:::tip
This story was distributed as a release by Jon Stojan under HackerNoon’s Business Blogging Program.

:::

Liked Liked