From Chatbot to Agent: The ReAct Loop That Changed Everything

digitado ⋅ 28 de May de 2026

How interleaving reasoning and action transforms text generators into problem-solving systems

For years, we called systems “intelligent” when they could produce convincing text. That was an understandable mistake. The output looked coherent, the tone felt confident, and the latency was low enough to make the interaction feel alive. But the moment you asked one of those systems to verify a live fact, inspect the result of an API call, or revise a plan after new evidence arrived, the illusion started to crack.

That gap is what made early LLM applications so impressive in demos and so brittle in production. A model could explain a process it had never run, summarize a policy it had never checked, and answer a question whose real answer had changed since training. The quality of the prose disguised the absence of a loop between thought and world. Once teams began wiring models into software systems, the obvious question emerged: how do you get a language model to do more than autocomplete the most plausible answer?

The answer that changed the field was not a larger prompt. It was a loop.

The Failure Mode of Pure Generation

A pure text-generation interaction is structurally simple. The user asks. The model predicts. The system returns. That pattern is elegant precisely because it excludes everything messy: no tool calls, no external observations, no intermediate validation, no explicit plan revision.

It also fails the moment the task depends on information outside the prompt or action outside the model.

Take a mundane example. A user asks for the current price of NVIDIA stock. A model trained on historical data cannot know the live quote. It can, however, generate a highly plausible number. That is not malice. It is just the natural behavior of a system optimized to continue text, not to verify the world. OpenAI’s current agent-building guidance makes the limitation explicit in its rationale for web search and other tools: if you want information beyond the model’s cutoff, you must give the system access to a live source.[1]

The problem gets more serious as soon as the task stops being factual lookup and becomes multi-step work. If the model needs to inspect a ticket, call a search tool, compare results, and update a record, pure generation is not merely unreliable. It is structurally incapable of doing the job.

That limitation set the stage for one of the most important papers in the modern agent stack.

ReAct Turned Text Generation into a Control Loop

In late 2022, Shunyu Yao and collaborators introduced ReAct: a prompting pattern that interleaves reasoning traces with external actions.[2] The move sounds modest. In practice, it was decisive.

Instead of forcing the model to jump straight from question to answer, ReAct lets the system alternate among three states. First, it reasons about what it knows and what it lacks. Then it takes an action, usually by calling a tool. Then it receives an observation from the environment and updates its next step accordingly. That observation can be a search result, a database row, a compiler error, a web page, or any other structured return from the outside world.

The pattern looks like this in miniature:

User: Is the Tuesday database fix actually deployed?

Thought: I need deployment status and the original ticket context.
Action: get_incident("INC-4821")
Observation: status=open, related_change=CHG-912

Thought: I have the change record ID but not deployment confirmation.
Action: get_change_status("CHG-912")
Observation: status=completed, deployed_at=2026-03-09T22:14:00Z

Thought: The fix was deployed last night. I should answer directly and note timing.
Response: Yes. The change linked to the incident completed on March 9 at 22:14 UTC.

The conceptual jump is larger than the syntax suggests. The model is no longer asked to simulate competence in one shot. It is allowed to notice missing information, gather evidence, and continue reasoning after the environment answers back.

Why the Loop Matters More Than the Label

The easiest way to misunderstand ReAct is to treat it as a fashionable prompt format. Its real contribution is architectural.

Once an agent is allowed to reason after each observation, several things become possible at once. The system can defer commitment until it has evidence. It can recover from partial failure because tool errors become new observations instead of dead ends. It can chain multiple steps without pretending the first plan was perfect. And, crucially, it can reveal enough intermediate state for engineers to debug why a workflow succeeded or failed.

That last point is where the pattern matured in production. You do not need a model to expose every hidden internal token to benefit from ReAct-like behavior. In practice, many systems log concise plans, tool choices, and structured step metadata instead of dumping raw chain-of-thought. The engineering win comes from explicit intermediate state, not from romanticizing the model’s private monologue.

This distinction matters because the reasoning loop is not valuable as theater. It is valuable because it creates checkpoints between decisions and actions.

Chain-of-Thought Opened the Door

ReAct did not appear in a vacuum. Earlier in 2022, Jason Wei and colleagues showed that chain-of-thought prompting could dramatically improve reasoning performance in sufficiently large models by eliciting intermediate steps rather than demanding immediate answers.[3] That result helped establish a broader principle: when difficult tasks are decomposed into explicit intermediate reasoning, models often perform better.

ReAct extends that idea into an interactive environment. The model does not merely think step by step. It thinks, acts, inspects reality, and then thinks again. That extra “inspect reality” stage is what makes the pattern so useful in software systems. It takes reasoning out of a sealed prompt and embeds it in a feedback loop.

Once you see that clearly, the difference between a chatbot and an agent stops being fuzzy. A chatbot produces language from context. An agent updates its behavior from observations.

Reflexion Added a Second Loop: Self-Critique

Even with tools, the first attempt is not always good enough. The agent may answer too quickly, miss a corner case, or misread a tool result. Reflexion, introduced by Noah Shinn and collaborators in 2023, adds a second loop around the first one.[4]

The core move is simple. After generating an answer or taking a trajectory through the task, the agent critiques its own performance using feedback signals expressed in language rather than weight updates. In the original paper, that reflective feedback is stored in an episodic memory buffer and used to improve future trials. The important idea for practitioners is that the model can act as both performer and reviewer.

A compact implementation looks like this:

class ReflexionAgent:
    def __init__(self, llm, quality_threshold: float = 0.8, max_iterations: int = 3):
        self.llm = llm
        self.quality_threshold = quality_threshold
        self.max_iterations = max_iterations

    async def solve(self, task: str) -> str:
        context = f"Task: {task}"

        for _ in range(self.max_iterations):
            response = await self._generate(context)
            critique, score = await self._reflect(task, response)

            if score >= self.quality_threshold:
                return response

            context = (
                f"Task: {task}n"
                f"Previous response: {response}n"
                f"Critique: {critique}n"
                f"Revise the answer."
            )

        return response

This pattern is easy to oversell, so it is worth being precise about what it buys you. Reflexion does not magically create truth. It creates a structured opportunity for the system to catch incompleteness, shallow reasoning, weak organization, or obvious contradictions before the answer leaves the building. In workflows like coding, planning, and long-form synthesis, that second pass often pays for itself.

It also exposes an important production trade-off. Better answers cost more time and tokens. Once you add critique loops, you have left the world of single-shot latency.

Tree-of-Thoughts Broadened the Search

ReAct assumes a largely serial path: inspect, act, observe, continue. But some problems are not hard because the next step is unclear. They are hard because there are several plausible next steps and the first one may be a trap.

That is where Tree of Thoughts enters. In 2023, Shunyu Yao and colleagues generalized chain-of-thought into a framework that explores multiple reasoning paths, evaluates them, and can look ahead or backtrack when needed.[5] Instead of treating one candidate thought sequence as destiny, the system treats intermediate thoughts as search nodes.

For planning-heavy tasks, this changes the economics of reasoning. You spend more compute up front, but you buy optionality. If a travel-planning agent, incident-response agent, or research agent faces multiple viable decompositions, Tree of Thoughts gives it a way to compare branches rather than overcommit to the first decent-looking idea.

That is the deeper pattern connecting ReAct, Reflexion, and Tree of Thoughts. None of them are really about clever prompts. All of them are about replacing the single-shot answer with a controlled process.

The Real Production Pattern Is Perceive, Reason, Act, Inspect

Once these techniques move out of papers and into applications, they start to converge on a practical loop:

The system perceives an input, reasons about the gap between what it knows and what it needs, acts through tools or APIs, inspects the observation, and either answers or continues. Sometimes it critiques itself. Sometimes it explores alternate branches. Sometimes it hands the work to a specialized agent. But the governing shape remains the same.

OpenAI’s current agent guidance describes the same stack in product terms: models, tools, state or memory, and orchestration.[1:1] That is not accidental. ReAct and its descendants provided the conceptual blueprint for how those parts work together.

A production-grade loop often ends up looking like this:

async def agent_loop(user_input: str, tools, llm, max_steps: int = 6):
    state = [{"role": "user", "content": user_input}]

    for _ in range(max_steps):
        response = await llm.respond(state, tools=tools)

        if response.type == "tool_call":
            result = await tools[response.name](**response.arguments)
            state.append({"role": "tool", "name": response.name, "content": str(result)})
            continue

        if response.type == "final":
            return response.content

    raise RuntimeError("Step limit exceeded")

That code is intentionally plain because the point is not cleverness. The point is that the loop exists. The model gets to update its next move after the world pushes back.

What Changed, Exactly?

Before this shift, the dominant question in LLM applications was, “How do I prompt the model to say the right thing?” After this shift, the better question became, “What loop should this model run so it can discover, verify, and refine the right thing?”

That sounds like a small edit in phrasing. It changes the entire implementation.

You stop treating tools as accessories and start treating them as sensory organs and hands. You stop treating latency as a nuisance and start treating it as the price of verification. You stop evaluating only answer quality and start evaluating trajectories: which tools were called, which branches were explored, where the model hesitated, and whether the loop recovered from bad observations.

That is the moment a chatbot becomes an agent. Not when it sounds more confident. When it becomes capable of being corrected by reality.

What To Try Next

If you are building one now, start with a workflow where wrong answers are expensive but the action space is still small. Make the loop visible. Log the steps. Watch where the model reaches for tools too early, too late, or not at all. The fastest way to understand agent behavior is to inspect the loop it runs when the first answer is not enough.

And if your team has found a cleaner way to balance verification, latency, and cost, I would like to see that design. The field has no shortage of model comparisons. What it needs more of is honest architecture.

OpenAI, “Building agents,” developer documentation, accessed March 2026.
Shunyu Yao et al., “ReAct: Synergizing Reasoning and Acting in Language Models,” arXiv, first submitted October 2022, revised March 2023.
Jason Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” arXiv, first submitted January 2022.
Noah Shinn et al., “Reflexion: Language Agents with Verbal Reinforcement Learning,” arXiv, first submitted March 2023.
Shunyu Yao et al., “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” arXiv, first submitted May 2023.

From Chatbot to Agent: The ReAct Loop That Changed Everything was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Like 0

Liked Liked