Ask Your Factory Floor Anything: Structuring Industrial Data for AI Agents
A Unified Namespace (UNS) gives every sensor, actuator, and PLC a path in a shared ISA-95 hierarchy. Data from every facility flows into one namespace, whether the transport is MQTT, OPC UA, or Kafka. The structure problem? Solved.
But structure alone doesn’t make those paths queryable. Once you store millions of timestamped values in a dedicated time-series database, the tag_id in each row is just a reference, and the hierarchy that gives it meaning lives somewhere else. Every contextual question (“which presses across all sites ran the most?”) becomes a cross-system integration problem, stitching readings from one store to the equipment context in another.
That’s where the choice of database matters. When time-series data and equipment context share the same PostgreSQL instance, contextualization is a single SQL query, not an integration project.
This article walks through that architecture using a fictional company, ACME Manufacturing, with assembly lines and presses in Detroit and Munich. The examples use MQTT as the transport layer, but the schema patterns apply regardless of protocol.
The single-database advantage
Most UNS deployments split time-series storage from the equipment context. If your readings sit in InfluxDB and your hierarchy lives in a separate relational store, every contextual query requires two systems: one to pull the readings, another to resolve what generated them. Application code stitches the results together. It works, but it doesn’t scale to the queries that make an UNS operationally useful.
PostgreSQL with TimescaleDB closes that gap. Because both datasets share a query planner, the database doesn’t just join them; it enforces data quality and resolves the full hierarchy at query time. No middleware, no cross-system stitching.
That advantage compounds fast. Consider a manufacturer with presses distributed across multiple sites. Each press reports a cycle counter to a UNS path like acme/detroit/assembly/line_01/press_01/press_cycles. Want to know which presses ran the most in the last 24 hours, grouped by site and line? One query. Want an AI agent to understand what a “press” is and answer questions about press performance in plain English? That works too, because the relational schema encodes the concept. The database knows what a press is, where it sits in the hierarchy, and what its readings mean.
That capability rests on a specific set of PostgreSQL features that enforce data quality at the schema level, before any application code runs.
Building the UNS schema
The schema needs two tables. Start with uns_namespace, which maps the full ISA-95 hierarchy from enterprise down to individual measurement point:
CREATE TABLE uns_namespace (
id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
enterprise TEXT NOT NULL,
site TEXT NOT NULL,
area TEXT NOT NULL,
line TEXT NOT NULL,
cell TEXT NOT NULL,
tag_name TEXT NOT NULL,
uns_path TEXT GENERATED ALWAYS AS (
enterprise || '/' || site || '/' || area || '/' || line || '/' || cell || '/' || tag_name
) STORED,
schema_version TEXT NOT NULL DEFAULT 'v1',
description TEXT,
UNIQUE (enterprise, site, area, line, cell, tag_name)
);
CREATE UNIQUE INDEX idx_uns_namespace_path ON uns_namespace (uns_path);
Six columns follow the hierarchy: enterprise, site, area, line, cell, tag_name. uns_path is a generated column that assembles the full UNS path automatically. No application constructs it ad hoc, and no two sites can drift on the format.
With the hierarchy in place, tag_history stores the time-series readings that reference it:
CREATE TABLE tag_history (
ts TIMESTAMPTZ NOT NULL,
tag_id INT NOT NULL REFERENCES uns_namespace (id),
value DOUBLE PRECISION NOT NULL
);
CREATE UNIQUE INDEX idx_tag_history_dedup
ON tag_history (ts, tag_id);
SELECT create_hypertable('tag_history', 'ts', chunk_time_interval => INTERVAL '1 day');
The schema is deliberately minimal:
TIMESTAMPTZenforces timezone-aware timestamps at the column level.tsstores event time (when the reading occurred at the device), not ingestion time.- The unique index on
(ts, tag_id)deduplicates retries at insert time (ON CONFLICT DO NOTHING).
Under the hood, create_hypertable turns this table into a TimescaleDB hypertable that partitions automatically by time. In production, continuous aggregates handle incremental rollups without full-table scans. For tuning chunk intervals and compression in manufacturing workloads, see Building a Data Pipeline and Optimizing for High-Volume Production Data.
The foreign key from tag_history.tag_id to uns_namespace.id is where PostgreSQL earns its keep. Every reading must point to a valid namespace entry, or the insert is rejected. Combined with TIMESTAMPTZ, NOT NULL constraints, and the dedup index, the database enforces data quality at the schema level, before any application code runs. You’ve probably tried enforcing these standards in your ingest code, naming-convention wikis, and spreadsheets. Those work until a commissioning team goes offline or a firmware update changes the payload structure. Database-layer enforcement doesn’t care who wrote the code.
Other time-series databases handle storage well, but relational enforcement is what PostgreSQL was built for. TimescaleDB runs on PostgreSQL, so you get both time-series performance and relational integrity in one system.
From raw readings to factory-wide context
Because tag_id references uns_namespace directly, a single JOIN contextualizes any reading with its full position in the plant hierarchy:
SELECT
ns.site,
ns.area,
ns.line,
ns.cell,
ns.tag_name,
th.ts,
th.value
FROM tag_history th
JOIN uns_namespace ns ON ns.id = th.tag_id
WHERE th.ts > NOW() - INTERVAL '1 hour';
That’s the core pattern. Every row comes back with what happened, when, and where in the physical plant it happened. No application reconstructs the hierarchy; the database resolves it at query time.
Now, for the question we posed earlier. ACME runs stamping presses across both sites. Each press reports a cycle counter (registered as tag_name = 'press_cycles' in the namespace) that increments with every firing. A shift supervisor pulls up the terminal: which presses ran the most in the past 24 hours, and where are they?
SELECT
ns.site,
ns.area,
ns.line,
ns.cell AS equipment,
MAX(th.value) - MIN(th.value) AS press_cycles
FROM tag_history th
JOIN uns_namespace ns ON ns.id = th.tag_id
WHERE ns.cell LIKE 'press%'
AND ns.tag_name = 'press_cycles'
AND th.ts > NOW() - INTERVAL '1 day'
GROUP BY ns.site, ns.area, ns.line, ns.cell
ORDER BY press_cycles DESC;
PostgreSQL handles the filtering, the JOIN, and the grouped aggregation in a single pass:
| site | area | line | equipment | presscycles |
|—-|—-|—-|—-|—-|
| detroit | assembly | line01 | press01 | 847 |
| munich | assembly | line01 | press01 | 812 |
| detroit | assembly | line02 | press01 | 780 |
| detroit | assembly | line01 | press02 | 743 |
| munich | assembly | line02 | press01 | 695 |
| munich | assembly | line01 | press_02 | 641 |
Every row answers three questions at once: what happened (cycle count), where (site, area, line), and what equipment (the specific press). That’s the difference between storing time-series data and contextualizing it.
From one JOIN to an AI-ready factory
Start with PostgreSQL. Put uns_namespace in place first: this table encodes the hierarchy before any sensor publishes. Add TIMESTAMPTZ, NOT NULL constraints, and the dedup indexes as your ingestion pipeline stabilizes.
A UNS gives every tag a path. PostgreSQL gives that path a home next to the data it describes. One database, one query planner, one place where time-series readings and relational context meet. That same JOIN pattern scales to cross-domain questions: add a work_orders table sourced from your ERP, and a single query returns sensor averages and maintenance holds for a specific production order, all resolved through uns_namespace.
This is where AI agents change the equation. An agent doesn’t need to know how the data was cleaned or where the hierarchy came from. It queries the schema and gets trustworthy, contextualized answers because the enforcement already happened at insert time. The relational table is the equipment model, and it lives next to the data it describes. No separate registry, no config file, no lookup service.
MCP (Model Context Protocol) formalizes this pattern: an open standard for connecting AI agents to external data sources. A PostgreSQL schema with consistent vocabulary, clean hierarchies, and enforced constraints is exactly the data layer an MCP server needs. One connection, one schema that already encodes everything the agent requires.
Every query you write from here (shift reports, press utilization, AI agent lookups, digital twin synchronization) benefits from that single decision. That’s the difference between a namespace and a system you can actually ask questions of.