Your Automation Followed Instructions. Your AI Agent Makes Decisions.

That Changes Everything

Automation was created to do what it was told. It followed rules. Did the same things over and over again until something changed.

Then there is Agentic AI.

It gets what you want thinks about the good and bad and makes choices.

This change is not about computers; it is, about how we work.

It is when businesses stop asking, “Can the system do this?”. Start asking, “Can we trust what it decides?”

In 2019 I worked with a organization that used a RPA bot to match invoices from suppliers.

The bot went into their SAP system read invoice numbers from a shared folder matched them with purchase orders and flagged any mistakes. It worked great until they changed the screen a little. The robot stopped working. The team then spent two days sorting out the invoices by hand while they fixed the robot.

In 2024 the same organisation used a kind of AI for the same job.

When a new invoice format came from a supplier the AI had never seen it did not stop working. It looked at the document figured out where the information was checked invoices from the same supplier and matched them with a score of 0.87. It flagged the format for a person to check instead of just stopping. It was the problem but a very different kind of system.

The first one did what it was told while the second one made choices.

Same business problem. Fundamentally different kind of system. The first one followed instructions. The second one made decisions.

The shift from automation to agentic AI is not an incremental improvement in the same category. It is a change in what the system is doing. Automation executes. Agents reason, choose, and act. That distinction has profound implications — for architecture, for governance, and for the question of trust.

Four Stages of Enterprise Intelligence

To understand what is genuinely new about agentic AI, it helps to see where it sits in the evolution of enterprise automation. Most organisations have been through at least three of these stages, often without naming them clearly.

Stage 1 — Rules-Based Automation [RPA Era]

The system follows a script. Every step is explicitly authored by a human. The machine has no understanding of what it is doing — it is matching patterns and executing predefined actions. RPA bots, ETL pipelines, scheduled batch jobs, workflow automation tools: all of these are Stage 1 systems.

The intelligence is entirely in the designer. The machine contributes speed and repeatability. When the environment changes — a screen layout, a file format, an API endpoint — the script breaks and a human fixes it. Stage 1 systems are powerful for stable, high-volume, well-defined processes. They are brittle everywhere else.

Stage 2 — ML-Augmented Automation [Prediction Layer]

A machine learning model is inserted into the automation workflow to handle one decision point that was previously rules-based or human-driven. A fraud score determines whether a transaction is auto-approved or routed to review. A demand forecast drives replenishment order quantities. A churn probability score triggers a retention campaign.

The workflow structure is still human-authored. The orchestration is still deterministic. But one step now uses a learned model rather than a rule. The intelligence is higher at that specific decision point — but the system as a whole is still a script with a smarter subroutine. When the model drifts or produces unexpected outputs, the downstream automation still executes whatever the model returned.

Stage 3 — Orchestrated AI [Where Most Enterprises Are Now]

Multiple AI capabilities — language models, vision models, structured ML models — are chained together in a human-designed pipeline. An Azure Logic App calls Azure OpenAI to extract key terms from a contract, passes the output to Azure AI Search for similarity matching, scores the result with a custom ML model, and routes the outcome to the appropriate downstream system via Service Bus.

This is more capable than Stage 2, and the outputs are qualitatively richer. But the orchestration logic is still human-authored. The AI handles individual tasks within a workflow that a human designed. If the workflow does not anticipate a particular input or outcome, it does not adapt — it fails or routes to a default handler. The AI is executing assigned tasks, not composing a path to a goal.

Stage 4 — Agentic AI [The Shift That Changes the Rules]

The agent receives a goal. Not a workflow — a goal. It then decides what steps to take, which tools to invoke, in what order, and adapts based on what comes back. When a step fails or returns an unexpected result, the agent does not crash — it reassesses and tries a different approach. When the task requires a capability it was not explicitly instructed to use, it selects from a catalogue of available tools based on what the situation requires.

This is what the invoice matching agent was doing. Nobody wrote a script that said ‘if you see an unfamiliar document format, examine the structure, infer the mappings from context, cross-reference historical patterns, and flag for human review.’ The agent composed that path toward the goal of completing the match accurately. The intelligence is no longer in the script. It is in the agent’s reasoning.

The critical distinction: automation asks ‘what is the next step in the sequence?’ Agentic AI asks ‘what do I need to accomplish, and what is the most effective way to get there?’ That is not a marginal improvement. It is a different kind of system.

Three Things That Are Genuinely Different

Agentic AI is not automation with a better AI inside. The following three properties are qualitatively new — and each one has direct implications for how you design, deploy, and govern these systems.

1. The agent chooses its tools

In automation, the tools are fixed. A human decides which APIs to call, which databases to query, which systems to write to, and bakes those decisions into the workflow. In an agentic system, the agent selects tools from a catalogue at runtime, based on what the current task requires.

On Azure, this means defining a tool registry — a set of functions the agent can call, each with a description of what it does and what inputs it accepts. Azure OpenAI’s function-calling capability, combined with Semantic Kernel’s plugin architecture, allows the agent to evaluate which tools are relevant to the current step and invoke them accordingly. The agent is making an architectural decision — which capability to use — at inference time, not at design time.

The implication: your tool catalogue is now a governance surface. Every tool you expose to an agent is something the agent can use autonomously. If you expose a tool that sends emails, the agent may send emails. If you expose a tool that writes to a production database, the agent may write to the production database. Authority limits — which tools the agent can invoke autonomously versus which require human approval — need to be encoded in the tool definitions and enforced by the constraint layer, not assumed from the agent’s training.

2. The agent maintains context across steps

Traditional automation is effectively stateless between steps. Each function executes, passes its output to the next, and has no memory of what came before beyond the data it received. If a step fails midway through a complex process, the system typically either retries from the beginning or routes to a dead-letter queue.

Agentic systems maintain a reasoning context across the full task. The agent remembers what it tried, what the result was, what alternatives it considered, and why it chose the path it did. When a tool call fails — a database query returns no results, an API times out — the agent does not stop. It considers why the step failed and whether a different approach might succeed.

On Azure, this is implemented through the agent’s conversation history — the running record of the agent’s thoughts, tool calls, and tool responses that Semantic Kernel or the Azure AI Agent Service maintains across the task execution. This context window is not infinite, and managing what stays in context as tasks become more complex is a real engineering challenge. But the capability itself — adaptive multi-step reasoning with memory of prior steps — is what enables agents to handle the kind of unstructured, variable tasks that automation cannot.

3. The agent’s reasoning is emergent, not scripted

This is the most consequential difference, and the one with the most significant governance implications.

In a scripted automation, the path from input to output is fully determined at design time. You can trace every decision to a specific rule or condition that a human wrote. When something goes wrong, you read the script and find the error.

In an agentic system, the path from input to output is composed at runtime. The agent reasons through the task — and the specific reasoning chain it follows, the tools it selects, the intermediate conclusions it reaches — was not explicitly authored by a human. It emerged from the interaction between the agent’s training, the available tools, the goal it was given, and the inputs it received.

This means that for any given output, you cannot simply read a script to understand why the agent did what it did. You need the agent to have recorded its reasoning — in the form of an audit trail that captures not just what it did, but the intermediate steps, the tool calls it made, and the outputs that shaped its decisions. Audit trails for agentic systems are not a nice-to-have. They are the minimum requirement for governing a system whose behaviour is emergent rather than scripted.

What Has Not Changed — And Why That Matters More Now

The shift to agentic AI changes the architecture. It does not change the governance questions. In fact, it makes them more urgent.

The question → Automation

  • Who is responsible when the system gets it wrong?
  • What decisions can the system make autonomously?
  • Can we explain what the system did and why?
  • How do we detect when the system is drifting?
  • What happens when a human needs to intervene?

Why it’s harder now → Agentic AI

  • The agent composed the path — no human authored it
  • The agent chooses its tools — authority limits must be explicit
  • The reasoning was emergent — the audit trail must capture it
  • The behaviour is adaptive — drift can look like normal variation
  • The agent may be mid-task — override design is more complex

Each of these questions existed in the automation world. But in automation, the answers were embedded in the script — you could read the workflow and find the authority limits, the decision logic, the audit trail. In an agentic world, none of those answers are in the script, because there is no script. They need to be in the architecture.

Authority limits must be explicit and enforced at the tool level

If an automation workflow could transfer funds, a human designed that capability in and presumably reviewed it. If an agentic system has access to a fund transfer tool, the agent may invoke it autonomously as part of a task the human never intended to involve fund transfers. The authority limit is not visible in any script. It needs to be enforced in the tool definition — this tool requires human approval before execution — and in a constraint enforcement service that intercepts any tool call above a defined authority threshold before it executes.

The audit trail must capture reasoning, not just actions

A traditional automation audit log records: step executed, input received, output produced. For governance purposes, that is usually sufficient — you can reconstruct the decision from the log.

An agentic audit trail needs to record: agent’s goal, reasoning step, tool selected, tool inputs, tool outputs, next reasoning step. Not just what the agent did, but the chain of thought that led it there. On Azure, this means capturing the full agent conversation history — including intermediate reasoning — in an append-only store such as Azure SQL with row-level security, or Azure Cosmos DB with a change feed for real-time monitoring. Immutability is non-negotiable: the audit trail is the governance record, and it must be tamper-evident.

Shadow mode before autonomous operation is not optional

In automation, shadow mode means running the new script in parallel with the old process to verify outputs before cutting over. In an agentic system, shadow mode means running the agent in parallel with human decision-makers — the agent completes the task, humans make the actual decisions, and the divergence between the two is systematically analysed.

This is how you build an evidence base for where the agent can be trusted to act autonomously and where it needs human oversight. It is also how you discover edge cases the agent handles poorly before those edge cases cause real harm. No agentic system should move from shadow mode to autonomous operation without a structured analysis of divergence patterns, confidence calibration, and failure modes.

Trust Protocols for Agentic AI

When machines begin to make decisions, the challenge shifts from execution to judgment. Trust isn’t automatic — it must be designed into the system. On Azure AI Foundry, that means layering protocols that make agentic AI transparent, accountable, and governable.

Shadow Mode Deployment

  • Agents run in parallel with existing workflows.
  • Their decisions are logged but not enacted.
  • Human teams compare agent outputs against manual or scripted results.
  • Builds confidence before agents are allowed to act autonomously.

Transparent Dashboards

  • Every agent decision is surfaced with confidence scores and retrieved context.
  • Finance teams can see why the agent acted — the documents, rules, or patterns it relied on.
  • Dashboards become the audit trail, not just a black box.

Human-in-the-Loop Escalation

  • High‑impact or low‑confidence actions are routed to human reviewers.
  • Escalation queues in Service Bus + Logic Apps ensure accountability.
  • Humans remain the final authority for exceptional cases.

Constraint Enforcement

  • A lightweight policy service validates every consequential tool call.
  • Authority limits (e.g., transaction thresholds, approval rules) are enforced before execution.
  • Prevents agents from acting outside governance boundaries.

Behavioural Monitoring

  • Azure Monitor + Application Insights track patterns of agent behavior.
  • Frequent low‑confidence scores, repeated tool failures, or clustered human overrides are early signals of drift.
  • Continuous evaluation ensures alignment over time.

What the Architecture Looks Like on Azure AI Foundry

To understand how the transition from orchestrated AI to agentic AI happens in practice, it helps to see how Azure AI Foundry structures the ecosystem. The change isn’t just a new prompt — it’s a new architectural layer.

Agent Framework — Semantic Kernel as the Cognitive Orchestrator

Semantic Kernel acts as the agent definition and reasoning layer within Foundry.

  • Provides the plugin architecture for the agent’s tool catalogue.
  • Manages memory and context persistence across multi‑step reasoning.
  • Uses the planner to compose tool calls dynamically toward the agent’s goal.
  • Integrates natively with Azure OpenAI for function calling and reasoning.

Agent Hosting — Azure AI Agent Service for Lifecycle Management

Azure AI Agent Service (preview) is Foundry’s managed hosting layer for agents.

  • Maintains agent definitions, conversation threads, and tool registrations.
  • Handles state management for multi‑turn interactions.
  • Integrates with Azure Entra ID for secure, role‑based tool authorization.

Knowledge Layer — Azure AI Search for Grounded Retrieval

Agents reason over enterprise knowledge through hybrid retrieval in AI Foundry.

  • Combines keyword search, vector embeddings, and metadata filters.
  • Grounds reasoning in retrieved context to reduce hallucination.
  • Captures retrieved documents as part of the audit trail for transparency.

Governance Layer — Constraint Enforcement Before Action

A lightweight Azure Function acts as the policy gatekeeper.

  • Intercepts tool calls with real‑world consequences (database writes, transactions).
  • Validates actions against authority limits and compliance rules.
  • Logs validation outcomes and routes exceptions to human approval queues via Service Bus and Logic Apps.

Monitoring Layer — Azure Monitor + Application Insights for Behavioural Observability

Standard infrastructure monitoring is not sufficient for agentic systems. You need behavioural observability: tracking which tools the agent invokes most frequently, where it diverges from expected paths, where confidence scores are consistently low, and where human overrides cluster. These patterns are the early warning system for agent drift and misalignment.

Beyond infrastructure metrics, Foundry enables behavioural telemetry.

  • Tracks tool invocation patterns, confidence scores, and divergence from expected paths.
  • Identifies clusters of human overrides — early signals of agent drift or misalignment.
  • Feeds insights into evaluation pipelines for continuous governance.

What I Got Wrong Early

When I first started working with agentic systems, I made the mistake of thinking about them as sophisticated orchestration — smarter pipelines with a language model at the centre. I focused on the tool definitions, the prompt design, the model selection. I treated governance as a downstream concern, something to add once the agent was working reliably.

That framing was wrong in three specific ways.

  1. I underestimated how quickly agents reach for tools in unexpected ways. In one early deployment, an agent given access to both a customer database query tool and an email composition tool — for separate purposes — combined them in a task the original design never intended, querying the customer database to populate a draft email. The agent was being helpful. The action was not authorised. The constraint enforcement layer was not in place. We caught it in testing. We might not have caught it in production.
  2. I designed audit trails that captured actions but not reasoning. When we needed to investigate why the agent had taken a particular path, the logs showed what it did but not why it chose that approach over alternatives. Reconstructing the reasoning from action logs alone is not possible. The reasoning needs to be captured at the time it happens.
  3. I moved from shadow mode to autonomous operation based on aggregate accuracy metrics rather than a systematic divergence analysis. The agent’s overall accuracy was high. But the cases where it diverged from human decisions were clustered in specific edge case patterns — and those patterns were not visible in aggregate accuracy. A structured divergence analysis before go-live would have revealed them.

The governance architecture for an agentic system needs to be designed before the agent is. Not because the agent is dangerous, but because the agent’s behaviour is emergent — and emergent behaviour needs designed oversight, not assumed oversight.

The Decision We Are Actually Making

When our organisation moves from automation to agentic AI, we are making a decision about delegation. Automation delegates execution — the human decides what to do and the machine does it. Agentic AI delegates decision-making — the human defines the goal and the machine decides how to achieve it.

That is a meaningful change in the relationship between the organisation and the technology. It is not a reason to avoid agentic AI. The capability is genuine, the business value is real, and organisations that figure out how to govern it effectively will have a significant advantage over those that either avoid it or deploy it without appropriate oversight.

But it is a reason to take the governance architecture as seriously as the agent architecture. The organisations that are getting the most value from agentic AI right now are not the ones who deployed the most capable agents. They are the ones who designed the most thoughtful boundaries — explicit authority limits, complete audit trails, systematic shadow mode processes — and then gave their agents room to operate within those boundaries.

The RPA bot broke when a field moved three pixels. It broke cleanly, visibly, and immediately. We knew exactly what failed and why.

An agent that makes decisions within poorly designed boundaries does not break cleanly. It operates, makes choices, produces outputs — and the problem may not be visible until the outputs have accumulated into something the business cannot easily reverse.

Design the boundaries first. Then build the agent.

The shift from automation to agentic AI is the most significant change in enterprise technology in a generation. The organisations that will benefit most are not the ones that deploy fastest. They are the ones that govern best.

Conclusion: From Execution to Judgment

The evolution from automation to agentic AI isn’t just a technical milestone — it’s a leadership one. Enterprises that once measured success by efficiency now measure it by trust, adaptability, and accountability. When systems begin to reason, the question shifts from “Can it execute?” to “Can we trust its judgment?” The organisations that thrive in this new era will be those that design for transparency, govern for confidence, and treat AI not as a tool to control, but as a colleague to collaborate with. That changes everything — not just how work gets done, but how decisions shape the enterprise itself.


Your Automation Followed Instructions. Your AI Agent Makes Decisions. was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Liked Liked