Building a Reliable AI Analytics Agent with BigQuery, a Semantic Layer, and Google ADK

digitado ⋅ 11 de February de 2026

How to prevent your LLM from confidently giving you the wrong numbers

You’ve probably seen it: an LLM generates impressively complex SQL queries in seconds, executes them flawlessly, and returns results that look completely reasonable. Your stakeholders are amazed. Your team is excited. And then someone checks the numbers manually.

They’re wrong.

Not obviously wrong , subtly, dangerously wrong. The kind of wrong that makes it into a board presentation before anyone notices.

Here’s the uncomfortable truth: Large Language Models are exceptionally good at generating SQL. They are exceptionally bad at understanding your business logic.

If you connect an LLM directly to BigQuery and ask it to answer business questions, you’re playing Russian roulette with your metrics. Eventually, you’ll get incorrect joins, drifting metric definitions, and KPIs that look right but aren’t trusted by anyone who actually understands the data.

This article shows you how to build analytics agents that are both powerful and trustworthy , using a semantic layer and Google’s Agent Development Kit (ADK) while keeping everything lightweight and cloud-native.

The Core Problem: Schema ≠ Meaning

Let me show you why this is harder than it looks.

Your BigQuery schema might have these columns:

orders.net_amount
orders.total_amount  
orders.tax_amount
orders.refunded_amount
orders.discount_amount

Now, quick question: which one is “revenue”?

Is it net of tax?
Is it net of refunds?
Is it net of discounts?
At order grain or line-item grain?
Does it include canceled orders?
What about orders in the current billing period that haven’t shipped yet?

Humans in your organization know the answer. There’s probably a Slack thread with 47 messages and three Excel attachments that definitively settled this question back in Q2. Your finance team has it documented somewhere. Your data team has it in their tribal knowledge.

LLMs have to guess. And they’ll guess differently every time, depending on how the question is phrased, what context is in the conversation, and which training examples most closely matched the query pattern.

A reliable analytics agent must never guess business logic. Full stop.

The Solution: Separate Meaning from Storage

The architecture we’re building has three distinct layers, each with a clear responsibility:

Natural language question
        ↓
[ADK Agent] ← Reasoning & orchestration
        ↓
[Semantic Registry] ← Metrics & definitions  
        ↓
[BigQuery] ← Raw storage & computation

Here’s the key insight that makes this work:

The agent selects definitions, not tables.

Instead of giving your LLM free reign to explore your database schema and construct queries, you give it a curated menu of pre-defined metrics. It’s the difference between:

“Figure out how to calculate revenue from these 47 tables”
“Here are 12 approved metrics you can query, including net_revenue”

Let’s build this layer by layer.

Layer 1: BigQuery Computation Layer (Views)

Your raw tables stay raw. Messy, denormalized, full of edge cases — exactly as data warehouses tend to be.

But you don’t expose raw tables to your semantic layer. Instead, you create analytics views that encode all your complex business logic once, in SQL, where it belongs.

Here’s a concrete example:

-- analytics.fact_orders
CREATE VIEW analytics.fact_orders AS
SELECT
  o.order_id,
  DATE(o.order_timestamp) AS order_date,
  o.customer_id,
  c.country_code AS country,
  o.total_amount,
  o.tax_amount,
  COALESCE(r.refunded_amount, 0) AS refunded_amount,
  o.discount_amount,
  -- Here's where business logic lives
  (o.total_amount - o.tax_amount - COALESCE(r.refunded_amount, 0)) AS net_amount
FROM raw.orders o
LEFT JOIN raw.customers c 
  ON o.customer_id = c.customer_id
LEFT JOIN (
  SELECT 
    order_id,
    SUM(refund_amount) AS refunded_amount
  FROM raw.refunds
  WHERE refund_status = 'COMPLETED'
  GROUP BY order_id
) r ON o.order_id = r.order_id
WHERE o.order_status != 'CANCELLED';

Notice what this view does:

Encodes joins: The relationship between orders, customers, and refunds is handled once
Normalizes grain: Everything is at order-level grain
Implements business rules: Cancelled orders excluded, only completed refunds counted
Derives metrics: net_amount calculation is explicit and consistent
Hides complexity: Downstream consumers don’t see the raw mess

This is just SQL, running in BigQuery, using standard views. No new infrastructure, no vendor lock-in, no black boxes. But it’s the foundation that makes everything else possible.

Layer 2: Semantic Registry (Machine-Readable Metadata)

This is where the magic happens. The semantic registry is a machine-readable catalog that tells your AI agent exactly what metrics exist, how they’re calculated, and what constraints apply.

Here’s what a real metrics registry looks like:

metrics.yaml:

metrics:
  net_revenue:
    description: Revenue excluding tax, refunds, and cancelled orders
    sql: SUM(net_amount)
    source: analytics.fact_orders
    grain: day
    dimensions:
      - order_date
      - country
      - customer_segment
    constraints:
      max_date_range_days: 366
    
  order_count:
    description: Total number of completed orders
    sql: COUNT(DISTINCT order_id)
    source: analytics.fact_orders
    grain: day
    dimensions:
      - order_date
      - country
      
  average_order_value:
    description: Average net revenue per order
    sql: SUM(net_amount) / COUNT(DISTINCT order_id)
    source: analytics.fact_orders
    grain: day
    dimensions:
      - order_date
      - country

dimensions.yaml:

dimensions:
  order_date:
    column: order_date
    type: date
    description: Date the order was placed
    
  country:
    column: country
    type: string
    description: Customer country code (ISO 3166-1 alpha-2)
    allowed_values:
      - US
      - GB
      - FR
      - DE
      - CA
    
  customer_segment:
    column: customer_segment  
    type: string
    description: Customer segmentation tier
    allowed_values:
      - enterprise
      - mid_market
      - smb

Why This Registry Matters for LLMs

Let’s compare what an agent has to do with and without a semantic registry:

Without a registry, the agent must:

Infer which columns represent metrics
Infer how to join tables
Infer the correct grain for aggregation
Infer what filters are valid
Infer which edge cases to handle
Hope all these inferences are correct

With a registry, the agent:

Selects from a finite, approved set of metrics
Reuses pre-validated SQL expressions
Respects explicit constraints
Returns results with documented definitions
Can’t accidentally create new “interpretations” of your metrics

You’re not “prompting better.” You’re removing ambiguity from the system architecture.

This isn’t a prompt engineering trick. This is software engineering.

Layer 3: Google ADK Agent (Reasoning Layer)

Now we get to the AI part. But notice: we built two entire layers before we even touched the LLM.

Google’s Agent Development Kit provides the structured reasoning layer that sits on top of our semantic foundation. ADK gives us:

Structured tool calling: The agent invokes functions with typed parameters, not free-form SQL
Deterministic execution: Same question → same query → same result
Grounding in trusted systems: The agent reasons about metrics, not tables
Observability: You can trace exactly how the agent reached its conclusion

Here’s the key constraint: The agent is not allowed to write free SQL.

Instead, it has access to a tool that looks like this:

Tool Contract:

@tool
def query_metric(
    metric_name: str,
    filters: dict[str, Any],
    start_date: str,
    end_date: str,
    group_by: list[str] = None
) -> dict:
    """
    Query a predefined metric from the semantic registry.
    
    Args:
        metric_name: Name of the metric (e.g., 'net_revenue')
        filters: Dimension filters (e.g., {'country': 'US'})
        start_date: Start date in YYYY-MM-DD format
        end_date: End date in YYYY-MM-DD format
        group_by: Optional list of dimensions to group by
        
    Returns:
        Query results with metadata
    """
    # Implementation details below

The agent’s job is to:

Map user intent → metric name
Extract filters from the question
Determine the time window
Validate everything against the registry
Invoke the tool with correct parameters

The tool’s job is to:

Look up the metric definition
Validate filters against allowed dimensions
Enforce constraints (like max date range)
Generate SQL using approved templates
Execute on BigQuery
Return results with full lineage

End-to-End Example: From Question to Query

Let’s watch this system in action.

User Question

“What was net revenue in France last month?”

Agent Reasoning (ADK trace)

Step 1: Parse intent
  - Metric needed: net_revenue
  - Filter: country = 'FR'
  - Time window: last month (2025-12-01 to 2025-12-31)

Step 2: Validate against registry
  ✓ Metric 'net_revenue' exists
  ✓ Dimension 'country' is allowed
  ✓ Value 'FR' is in allowed_values
  ✓ Date range: 31 days (within max_date_range_days: 366)

Step 3: Generate SQL from template
  - Source: analytics.fact_orders
  - Aggregation: SUM(net_amount)
  - Filters: country = 'FR' AND order_date BETWEEN ...
  
Step 4: Execute query

Generated SQL (Bounded and Safe)

SELECT
  SUM(net_amount) AS net_revenue
FROM analytics.fact_orders
WHERE country = 'FR'
  AND order_date BETWEEN '2025-12-01' AND '2025-12-31';

result

{
  "metric": "net_revenue",
  "value": 1247832.45,
  "currency": "USD",
  "filters": {
    "country": "FR",
    "start_date": "2025-12-01",
    "end_date": "2025-12-31"
  },
  "definition": "Revenue excluding tax, refunds, and cancelled orders",
  "source": "analytics.fact_orders",
  "execution_time_ms": 342
}

Notice what didn’t happen:

❌ No guessing about which columns to use
❌ No hallucinated joins
❌ No ambiguity about what “revenue” means
❌ No metric drift over time

The agent followed a deterministic path from question to answer, bounded by explicit rules at every step.

Adding Constraints (Critical for Production)

In a real system, you need guardrails to prevent expensive mistakes. The semantic registry supports this natively:

global_constraints:
  max_date_range_days: 366
  max_results: 10000
  allowed_aggregations:
    - SUM
    - COUNT
    - AVG
    - MAX
    - MIN
  require_date_filter: true

metric_constraints:
  net_revenue:
    required_dimensions:
      - order_date
    allowed_group_by:
      - country
      - customer_segment
      - order_date
    max_date_range_days: 90  # Override global for this metric

These constraints prevent:

Accidental full-table scans that cost thousands of dollars
Invalid dimension combinations that return nonsense
Queries that would time out or hit BigQuery limits
Metric misuse (like averaging a sum, or summing an average)

LLMs respect rules if rules are explicit. The key is making those rules machine-readable and enforced at runtime, not buried in documentation.

Implementation: ADK Agent Code

Here’s what the actual implementation looks like using Google ADK:

from google.adk import Agent, Tool
from google.cloud import bigquery
import yaml
from datetime import datetime, timedelta

class SemanticLayerAgent:
    def __init__(self, registry_path: str):
        self.bq_client = bigquery.Client()
        self.load_registry(registry_path)
        
    def load_registry(self, path: str):
        """Load semantic registry from YAML files"""
        with open(f"{path}/metrics.yaml") as f:
            self.metrics = yaml.safe_load(f)['metrics']
        with open(f"{path}/dimensions.yaml") as f:
            self.dimensions = yaml.safe_load(f)['dimensions']
        with open(f"{path}/constraints.yaml") as f:
            self.constraints = yaml.safe_load(f)
    
    @Tool
    def query_metric(
        self,
        metric_name: str,
        filters: dict,
        start_date: str,
        end_date: str,
        group_by: list[str] = None
    ) -> dict:
        """Query a predefined metric from the semantic registry"""
        
        # Validate metric exists
        if metric_name not in self.metrics:
            return {
                "error": f"Unknown metric: {metric_name}",
                "available_metrics": list(self.metrics.keys())
            }
        
        metric = self.metrics[metric_name]
        
        # Validate date range
        days = (datetime.fromisoformat(end_date) - 
                datetime.fromisoformat(start_date)).days
        max_days = metric.get('constraints', {}).get(
            'max_date_range_days',
            self.constraints['global_constraints']['max_date_range_days']
        )
        if days > max_days:
            return {
                "error": f"Date range too large: {days} days (max: {max_days})"
            }
        
        # Validate filters
        for dim, value in filters.items():
            if dim not in metric['dimensions']:
                return {
                    "error": f"Invalid dimension '{dim}' for metric '{metric_name}'",
                    "allowed_dimensions": metric['dimensions']
                }
        
        # Build SQL from template
        sql = self._build_sql(metric, filters, start_date, end_date, group_by)
        
        # Execute query
        try:
            query_job = self.bq_client.query(sql)
            results = [dict(row) for row in query_job.result()]
            
            return {
                "metric": metric_name,
                "description": metric['description'],
                "results": results,
                "filters": filters,
                "date_range": {"start": start_date, "end": end_date},
                "sql": sql,
                "execution_time_ms": query_job.total_bytes_processed
            }
        except Exception as e:
            return {"error": f"Query failed: {str(e)}"}
    
    def _build_sql(self, metric, filters, start_date, end_date, group_by):
        """Generate SQL from metric definition"""
        sql_parts = [
            f"SELECT",
            f"  {metric['sql']} AS {metric.get('name', 'value')}"
        ]
        
        if group_by:
            sql_parts[1] += ","
            sql_parts.append("  " + ", ".join(group_by))
        
        sql_parts.append(f"FROM {metric['source']}")
        
        # Build WHERE clause
        where_conditions = [
            f"order_date BETWEEN '{start_date}' AND '{end_date}'"
        ]
        for dim, value in filters.items():
            # Properly escape based on dimension type
            dim_def = self.dimensions[dim]
            if dim_def['type'] == 'string':
                where_conditions.append(f"{dim} = '{value}'")
            else:
                where_conditions.append(f"{dim} = {value}")
        
        sql_parts.append("WHERE " + " AND ".join(where_conditions))
        
        if group_by:
            sql_parts.append("GROUP BY " + ", ".join(group_by))
        
        return "n".join(sql_parts)


# Create the agent
semantic_agent = SemanticLayerAgent(registry_path="./semantic-layer")

agent = Agent(
    name="AnalyticsAgent",
    model="gemini-2.0-flash-exp",
    instruction="""You are an analytics expert that helps users query business metrics.
    
    Always use the query_metric tool to answer questions about metrics.
    Never generate SQL directly - always use the predefined metrics.
    
    When you receive a question:
    1. Identify which metric(s) the user is asking about
    2. Extract any filters (country, segment, etc.)
    3. Determine the time window
    4. Call query_metric with the correct parameters
    5. Present the results clearly, including the metric definition
    
    If a metric doesn't exist, tell the user what metrics are available.
    If filters are invalid, explain what dimensions are allowed for that metric.
    """,
    tools=[semantic_agent.query_metric]
)

Where to Store the Registry

Keep it simple. The semantic registry doesn’t need fancy infrastructure:

Option 1: Git Repository (Recommended)

semantic-layer/
├── metrics.yaml
├── dimensions.yaml
├── constraints.yaml
├── README.md
└── CHANGELOG.md

Benefits:

Version control built-in
Code review process for changes
Easy to audit and rollback
Works with existing CI/CD
No service dependencies

Option 2: GCS Bucket

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('my-semantic-layer')
blob = bucket.blob('metrics.yaml')
registry = yaml.safe_load(blob.download_as_text())

Why This Works So Well with BigQuery

This architecture plays to BigQuery’s strengths:

BigQuery gives you:

Serverless scale (no capacity planning)
Sub-second query performance
Columnar storage (efficient aggregations)
Cost controls (slot reservations, query quotas)
SQL expressiveness (standard SQL, window functions, UDFs)

The semantic layer gives you:

Consistent metric definitions
Governed data access
Self-documenting analytics
Protection against drift

ADK gives you:

Structured reasoning
Explainable AI
Safe orchestration
Production-grade reliability

Together, you get analytics agents you can actually trust in production.

Real Use Cases

This pattern excels in scenarios where correctness matters more than speed:

1. AI Analytics Copilots

Internal tools where employees ask questions about business metrics. The agent can explain trends, compare periods, and drill into dimensions — all while guaranteeing consistent definitions.

2. KPI Explanation Bots

Automated systems that explain why metrics moved. “Revenue dropped 12% in EMEA last week” becomes “Revenue dropped 12% in EMEA last week, driven by a 23% decrease in Germany partially offset by 8% growth in France.”

3. Log and Metric Summarizers

Agents that analyze application logs and metrics to identify anomalies. The semantic layer ensures “error rate” means the same thing across all services.

4. Embedded Analytics in SaaS

Customer-facing analytics where users ask questions about their data. You can’t afford to show them wrong numbers — the semantic layer prevents that.

5. Internal Data Q&A Tools

Slack bots, CLI tools, or web interfaces where teams query data. Instead of writing SQL or waiting for analysts, they ask natural language questions and get trusted answers.

The common thread: anywhere the answer must be correct, not just fluent.

Start Small, Scale Incrementally

You don’t need a perfect semantic layer on day one. Start minimal:

Week 1: Foundation

Create one analytics view in BigQuery
Define three core metrics in YAML
Build one simple ADK tool
Test with your most common questions

Week 2: Validation

Run queries side-by-side (agent vs. manual SQL)
Verify results match your BI dashboards
Add constraints based on what breaks
Document edge cases you discover

Week 3: Expansion

Add more metrics incrementally
Introduce dimension hierarchies
Implement query caching
Add monitoring and observability

Month 2+: Production Hardening

Add cost controls and quota management
Implement audit logging
Create metric approval workflow
Build self-service metric registration

The beauty of this architecture: you can evolve incrementally without re-architecture. Every metric you add makes the system more valuable without making it more complex.

Common Pitfalls (and How to Avoid Them)

Pitfall 1: Making the Semantic Layer Too Complex

Don’t: Try to model every possible query pattern upfront
Do: Start with your top 10 most-asked questions and build from there

Pitfall 2: Bypassing the Semantic Layer

Don’t: Give the agent access to raw BigQuery for “flexibility”
Do: Make it impossible to execute queries outside the registry

Pitfall 3: Forgetting About Cost Controls

Don’t: Let agents run unbounded date range queries
Do: Enforce max date ranges and require date filters

Pitfall 4: No Monitoring

Don’t: Deploy and forget
Do: Track query patterns, costs, accuracy, and user satisfaction

Pitfall 5: Metric Sprawl

Don’t: Let anyone add metrics without review
Do: Implement an approval process for new metrics (even a PR review works)

Resources

BigQuery Documentation: cloud.google.com/bigquery/docs
Google ADK GitHub: github.com/google/adk-python
ADK Documentation: google.github.io/adk-docs

Building a Reliable AI Analytics Agent with BigQuery, a Semantic Layer, and Google ADK was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Like 0

Liked Liked