Teaching OpenClaw to Use GPT-5.4 Pro

digitado ⋅ 17 de April de 2026

I had a Mac Mini running OpenClaw as a personal automation server. It handled dozens of daily tasks without me lifting a finger. But when I needed the reasoning power of ChatGPT, specifically GPT-5.4 Pro, a model only available through the web interface, I was the middleware. I was the copy-paste layer between two systems that should have been talking to each other.

That felt wrong. So I fixed it.

Now, when I send a message in Telegram, OpenClaw opens ChatGPT in my actual Chrome browser, types in the prompt, waits for the response to finish streaming, reads it, and brings the answer back to me, all without me switching a single window. You can see the full automation in action here.

When APIs Don’t Have What You Need

If you’ve spent any time building on top of OpenAI’s ecosystem, you’ve probably noticed an uncomfortable truth: the models you can access through the API and the models you can use in ChatGPT’s web interface are not the same list.

GPT-5.4 Pro, for instance, is only available to Plus and Pro subscribers through the browser. The API has its own catalog with its own pricing tiers. OpenAI gates certain capabilities behind the consumer subscription, and if you want those capabilities programmatically, you’re out of luck.

For someone trying to build a personal AI assistant that can leverage the best available model, this is a real limitation. I wanted my assistant to operate ChatGPT through the browser, using my existing subscription to access GPT-5.4 Pro directly. The browser session became, in a strange way, the most powerful AI endpoint I had access to.

So the question became: what if my assistant could just use the browser the same way I do?

Why Browser Control Matters

First, there’s session continuity. Every conversation OpenClaw starts shows up in my ChatGPT history, so I can pick up any thread later without checking separate logs.

Second, and this is the part that surprised me, it’s actually simpler. No API keys to rotate. No token counting. No billing surprises. My subscription is already paid. The browser is already open. I’m just letting my assistant sit in the same chair I sit in.

The Piece That Made It Possible

The solution came from Actionbook, a tool that helps agents understand how to interact with web pages. It provides structured information about what’s clickable, fillable, or readable on a page, so agents don’t have to parse HTML or figure out CSS selectors on their own.

The key advantage: Actionbook connects to your existing browser session. Most automation tools launch a fresh, isolated browser with no cookies or login state. But Actionbook works with the Chrome instance you’re already using, the one where you’re already logged into ChatGPT with your subscription active. No separate authentication, no session management, just direct control of the browser you already have open.

Let OpenClaw Set Itself Up

The only manual step was installing the Actionbook Chrome extension from the Chrome Web Store. Once installed, I gave OpenClaw a simple instruction:

Install actionbook CLI from https://github.com/actionbook/actionbook
Run actionbook setup, and pick extension mode for browser connection.

OpenClaw handled the rest. It ran the setup wizard, detected the environment, chose the extension mode when prompted, and established the connection. The whole process took about five minutes, most of which was the agent working through the interactive setup on its own.

Watching the Agent Work

The first time I tested it, I watched the entire sequence unfold on screen.

I typed a message in Telegram:

Use actionbook to open ChatGPT and ask:
What are the latest trends in AI agents for 2026? Bring me the answer.

The first time I tested it, I watched the entire sequence unfold on screen. Chrome came to the foreground. The ChatGPT page loaded. My question appeared in the input field. The send button was clicked. The response started streaming. When it finished, the text was sent back to me in Telegram.

What struck me was what I didn’t have to specify. I never told OpenClaw where the input box was or which button to click. It figured that out from ActionBook’s page descriptions.

Behind the scenes, the sequence looked like this:

actionbook search "chatgpt"
actionbook get chatgpt.com:/:default
actionbook browser snapshot
actionbook browser fill "your question here" --ref-id e3
actionbook browser click --ref-id e4
actionbook browser wait-idle
actionbook browser text

All that complexity happens behind the scenes. From my end, I just asked a question and got an answer. That’s what real automation should feel like.

Beyond ChatGPT: The Bigger Picture

Once I had this working, I started seeing applications everywhere.

One of the first things I tried was parallel GEO testing: sending the same prompt to ChatGPT in multiple tabs simultaneously to compare how responses differ across contexts. This is useful for understanding model behavior, testing prompt robustness, or just running multiple research threads at once. OpenClaw opens the tabs, sends the prompts, and collects all the responses. What would take me fifteen minutes of manual tab-switching takes about thirty seconds.

OpenClaw can operate any page I’m already logged into. Any web application where I have an active session (whether it’s a CRM, a project management tool, or a banking portal) becomes a potential automation surface. The agent doesn’t need an API. It doesn’t need OAuth tokens. It just needs the browser and an action manual.

This is a fundamentally different model of automation than what most of us are used to. Traditional automation says: “Find an API, get credentials, write integration code, handle edge cases, maintain it when the API changes.” Browser-native automation says: “If you can use it, your agent can use it.”

What This Means for Agents

OpenClaw now operates my browser the same way I would. It uses ChatGPT’s best models, pulls data from web apps without APIs, and controls any tool I’m logged into, all from a Telegram message. It’s not perfect. Pages load slowly sometimes, and website redesigns require updating the action manuals. But the principle is simple: if I can use it in my browser, any agent can use it too. The browser became the universal interface, not just for OpenClaw, but for any agent that needs to interact with the web.

Like 0

Liked Liked