Production Observability for Spring AI Agents on Amazon Bedrock Without Writing Tracing code

digitado ⋅ 21 de May de 2026

You shipped your first AI agent to production last quarter. It works. Customers love it. Then your finance lead pings you on Slack:

“Our Bedrock bill went up 4x last week. Which feature is burning the tokens?”

You stare at CloudWatch. You stare at your code. You realize you have no idea, because you never instrumented per-request token usage.
A week later, a customer complaint lands on your desk: “the assistant gave me wrong information at 2:47 PM yesterday.” You need to find that exact request, see the prompt, see the model response, and figure out what went sideways. You can’t, because your logs don’t carry session IDs and your APM doesn’t know what a Bedrock request ID is.
A month later, your security team runs a routine scan and finds customer emails and SSNs in your Datadog traces, copy-pasted there because somebody decided to log the prompt for debugging. Your CISO is now in your one-on-one.

Three problems. One root cause: AI agents are not normal HTTP services, and the standard observability stack doesn’t know what to do with them.

This tutorial walks through a Spring Boot starter that fixes all three at once: spring-ai-agentcore-observability. You add one dependency, set two properties, and every Bedrock call ships out fully enriched OpenTelemetry spans with PII already redacted. We’re going to build a working agent, validate it against live Amazon Bedrock, and inspect the span output byte-for-byte.

Everything you read below has been verified against amazon.nova-lite-v1:0 running in us-east-1. The full validation report is open-source.

The five problems you hit in production

If you’re running a Spring AI agent on Amazon Bedrock at any scale, you will eventually hit all five of these:

Five problems in Production

The thing the diagram is hinting at: these aren’t five separate engineering tickets. They share the same answer. You need a single, drop-in instrumentation layer that knows what an LLM request looks like, where the AWS-side correlation IDs come from, and how to scrub strings before they hit the wire. Hand-instrumenting each agent is a tax that compounds over time.

That’s what this starter is.

What you actually get

Dependency tree

That’s the deal. Two dependencies and three properties for five capabilities you’d otherwise wire by hand in every microservice.

Quick refresher: what is observability with OpenTelemetry?

If you’ve been writing services for a while, skim this. If LLM observability is new to you, this section is the reason the rest of the article makes sense.

Observability is the practice of inferring what a running system is doing from the data it emits. Three signals carry that data:

A trace is the story of one request as it travels through your system. It’s made of one or more spans, where each span is a unit of work (an HTTP call, a database query, a Bedrock invocation) with a start time, end time, attributes, and events.

OpenTelemetry (OTel) is the vendor-neutral standard for emitting these signals. It defines:

A wire protocol (OTLP) so any backend can receive your data
A set of SDKs (Java, Python, Go, Node, …) for producing it
Semantic conventions — agreed-upon attribute names so a span from one service looks the same as a span from another. The HTTP convention says use http.request.method. The database convention says db.system. The GenAI convention says gen_ai.usage.input_tokens.

Once your service speaks OTel, you can change backends without changing code. Move from Jaeger to Datadog to X-Ray to Grafana Tempo with a config flip. That’s the value proposition.

What semantic conventions mean for LLM apps

Until recently, OTel didn’t have a story for AI. Every team rolled their own attribute names: tokens_used, prompt_size, model_name, input_count. Dashboards didn’t survive a refactor. Alerts on “tokens spent” couldn’t aggregate across services because they used different keys.

In 2024 the GenAI semantic conventions landed. Now there’s a standard set of names:

The starter we’re building with emits exactly these names. That’s not an implementation detail it’s the difference between “my dashboard works” and “my dashboard works on every backend, forever, even if I switch providers.”

Why every Spring Boot LLM app needs this

A traditional Spring Boot REST service has well-understood failure modes. The DB is slow. The cache is cold. The downstream service is throttling. You instrument once, build standard dashboards, move on.

LLM-backed services break that pattern in five ways at once. Each one creates a kind of blindness:

Concretely, here is what each blindness costs you in production, and what an OTel-instrumented LLM app gives you instead:

| Blindness | What you actually need to see |
|—-|—-|
| Cost is invisible per endpoint | Token histogram with gen_ai.request.model and gen_ai.token.type so you can group spend by feature, model, or customer tier |
| Latency mixes inference time with HTTP overhead | Span hierarchy showing the Bedrock call as its own child span with provider-side latency |
| Errors look like generic 5xx | error.type classified to rate_limit, timeout, authentication_failure, invalid_request, server_error so alerts route to the right team |
| Prompts disappear into logs | Opt-in span events for prompt and completion, masked before export, queryable from your APM |
| No way to reproduce a bad answer | gen_ai.response.finish_reasons plus the captured completion plus a request-id pivot to provider-side logs |

The defenses you get from a properly instrumented LLM service

When this is wired up, the operational defenses you gain look like this:

Each one of those defenses is the difference between “my LLM app works” and “I can run my LLM app in regulated production at scale and sleep at night.”

Where this starter fits in the OTel ecosystem

The starter does one thing: takes Spring AI’s response metadata and turns it into OTel signals that follow the GenAI semantic conventions. Everything downstream the SDK, OTLP, the collector, the backend is standard OTel infrastructure that already exists in most production stacks. You’re not buying into a new tool; you’re filling in the LLM-shaped hole in the one you already use.

Architecture: how the magic happens

Before we write code, let’s understand what’s about to run inside your JVM. The starter is built around three moving parts: an AOP aspect, a PII masker, and an exporter wrapper. They cooperate without you ever seeing them.

End-to-end request flow

Plain English: a client posts a prompt to /invocations. The AgentCore HTTP controller dispatches to your @AgentCoreInvocation method. The aspect (the dotted line) is wrapping the controller transparently, so you never write tracing code. Your method calls Spring AI, which calls Bedrock, and the response flows back. The aspect reads token counts from the response metadata, builds the span, and hands it to the OTel SDK. Just before bytes leave the JVM, the masker scrubs sensitive strings. Whatever sits at the right edge Datadog, Jaeger, X-Ray only ever sees redacted content.

What the aspect actually does, step by step

Notice step 13 versus step 14. The aspect ends the span with the raw prompt content on it. The masker doesn’t run until the SDK hands the span to the exporter. That’s deliberate masking is a transformation on the way out the door, not a mutation on the way in. It means the in-memory span is always inspectable for debugging, but nothing crosses the network boundary unredacted.

The class topology

Two bits worth pointing out:

MaskingSpanData extends DelegatingSpanData and lazy-masks. It only computes the masked attributes the first time the exporter accesses them. Cheap.
PiiMasker is a @ConditionalOnMissingBean define your own bean and the auto-config steps aside. You can swap in healthcare-specific (HIPAA) or non-US patterns without forking the library.

Hands-on: building the agent

Enough theory. Let’s wire one up.

Project layout

my-agent/
├── pom.xml
└── src/
    └── main/
        ├── java/
        │   └── com/example/demo/
        │       ├── DemoApplication.java
        │       └── BedrockAgentService.java
        └── resources/
            └── application.properties

That’s the whole project. Three Java files, one properties file. Watch.

`pom.xml`

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
  <modelVersion>4.0.0</modelVersion>

  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>3.5.9</version>
    <relativePath/>
  </parent>

  <groupId>com.example</groupId>
  <artifactId>my-agent</artifactId>
  <version>1.0.0-SNAPSHOT</version>

  <properties>
    <java.version>17</java.version>
    <spring-ai.version>1.1.2</spring-ai.version>
    <opentelemetry-instrumentation.version>2.14.0</opentelemetry-instrumentation.version>
  </properties>

  <repositories>
    <repository>
      <id>spring-snapshots</id>
      <url>https://repo.spring.io/snapshot</url>
      <snapshots><enabled>true</enabled></snapshots>
    </repository>
    <repository>
      <id>central-portal-snapshots</id>
      <url>https://central.sonatype.com/repository/maven-snapshots/</url>
      <snapshots><enabled>true</enabled></snapshots>
    </repository>
  </repositories>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-bom</artifactId>
        <version>${spring-ai.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
      <dependency>
        <groupId>io.opentelemetry.instrumentation</groupId>
        <artifactId>opentelemetry-instrumentation-bom</artifactId>
        <version>${opentelemetry-instrumentation.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>

  <dependencies>
    <!-- The observability starter -->
    <dependency>
      <groupId>org.springaicommunity</groupId>
      <artifactId>spring-ai-agentcore-observability</artifactId>
      <version>1.1.0-SNAPSHOT</version>
    </dependency>

    <!-- Spring AI -> Amazon Bedrock Converse -->
    <dependency>
      <groupId>org.springframework.ai</groupId>
      <artifactId>spring-ai-starter-model-bedrock-converse</artifactId>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-maven-plugin</artifactId>
      </plugin>
    </plugins>
  </build>
</project>

Two <dependency> blocks. One is the observability starter, one is Spring AI’s Bedrock starter. Spring Boot pulls in the rest embedded Tomcat, AOP, OTel SDK, the AgentCore HTTP controller, the AWS SDK.

`application.properties`

spring.application.name=my-agent

# --- Spring AI Bedrock ---
spring.ai.bedrock.converse.chat.options.model=amazon.nova-lite-v1:0
spring.ai.bedrock.aws.region=${AWS_REGION:us-east-1}

# --- Observability: opt in to prompt/completion capture ---
spring.ai.agentcore.observability.capture-content=true
spring.ai.agentcore.observability.masking.enabled=true

# Custom regex patterns for things only your org cares about
spring.ai.agentcore.observability.masking.custom-regex[0]=\bAKIA[0-9A-Z]{16}\b
spring.ai.agentcore.observability.masking.custom-regex[1]=\bsk-[A-Za-z0-9]{20,}\b

# --- OpenTelemetry SDK ---
otel.traces.exporter=logging
otel.metrics.exporter=logging
otel.logs.exporter=none

The custom regex examples are real. The first redacts AWS access keys (anything starting with AKIA followed by 16 base32 characters). The second redacts OpenAI secret keys (sk- followed by 20+ alphanumerics). If somebody pastes a credential into a prompt by accident, your APM never sees it.

Gotcha number one: never set otel.traces.exporter=none. That disables the OTel SDK entirely. The starter’s auto-configuration runs through the SDK’s customizer chain, and if the SDK is off, our wiring never happens. For local development use logging. For production use otlp.

The handler

package com.example.demo;

import org.springaicommunity.agentcore.annotation.AgentCoreInvocation;
import org.springframework.ai.chat.messages.UserMessage;
import org.springframework.ai.chat.model.ChatModel;
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.prompt.Prompt;
import org.springframework.stereotype.Service;

@Service
public class BedrockAgentService {

  private final ChatModel chatModel;

  public BedrockAgentService(ChatModel chatModel) {
    this.chatModel = chatModel;
  }

  @AgentCoreInvocation
  public ChatResponse handle(String prompt) {
    return this.chatModel.call(new Prompt(new UserMessage(prompt)));
  }
}

That’s it. The @AgentCoreInvocation annotation registers your method with the AgentCore HTTP controller, so POST /invocations lands here. The aspect wraps the controller’s dispatch path, not your method directly that’s a deliberate design choice so AOP proxies don’t strip the annotation off your bean.

Main class

package com.example.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class DemoApplication {
  public static void main(String[] args) {
    SpringApplication.run(DemoApplication.class, args);
  }
}

Build and run:

mvn -B package
export AWS_REGION=us-east-1
java -jar target/my-agent-1.0.0-SNAPSHOT.jar

The smoke test that proves everything

This is the test I wrote against real Amazon Bedrock to validate every claim in the docs. Save it as RealBedrockEndToEndTest.java:

package com.example.demo;

import static org.assertj.core.api.Assertions.assertThat;

import io.opentelemetry.api.common.AttributeKey;
import io.opentelemetry.sdk.trace.data.SpanData;
import java.util.List;
import java.util.Optional;

import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.Test;
import org.springaicommunity.agentcore.observability.telemetry.GenAiTelemetrySupport;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.web.servlet.AutoConfigureMockMvc;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.context.annotation.Import;
import org.springframework.http.MediaType;
import org.springframework.test.web.servlet.MockMvc;
import org.springframework.test.web.servlet.request.MockMvcRequestBuilders;
import org.springframework.test.web.servlet.result.MockMvcResultMatchers;

@SpringBootTest(classes = DemoApplication.class)
@AutoConfigureMockMvc
@Import(InMemoryExporterConfig.class)
class RealBedrockEndToEndTest {

  @Autowired
  private MockMvc mockMvc;

  @AfterEach
  void resetExporter() {
    InMemoryExporterConfig.SPAN_EXPORTER.reset();
  }

  @Test
  void everything() throws Exception {
    String body = "Customer profile:n"
        + "Email: jane.doe@acme-corp.comn"
        + "SSN: 123-45-6789n"
        + "Visa: 4532-0151-1283-0366n"
        + "Phone: 555-234-5678n"
        + "AWS key: AKIAIOSFODNN7EXAMPLEn"
        + "OpenAI key: sk-abc123def456ghi789jkl012mno345n"
        + "Reply with just the word OK.";

    mockMvc.perform(MockMvcRequestBuilders.post("/invocations")
            .contentType(MediaType.TEXT_PLAIN)
            .header("x-amzn-bedrock-agentcore-session-id", "session-real-bedrock-1")
            .header("x-amzn-request-id", "req-real-bedrock-1")
            .content(body))
        .andExpect(MockMvcResultMatchers.status().isOk());

    List<SpanData> spans = InMemoryExporterConfig.SPAN_EXPORTER.getFinishedSpanItems();
    SpanData span = spans.stream()
        .filter(s -> s.getAttributes().get(GenAiTelemetrySupport.GEN_AI_PROVIDER_NAME) != null)
        .findFirst().orElseThrow();

    // GenAI semantic conventions
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.GEN_AI_PROVIDER_NAME)).isEqualTo("aws.bedrock");
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.GEN_AI_REQUEST_MODEL)).isEqualTo("amazon.nova-lite-v1:0");
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.GEN_AI_USAGE_INPUT_TOKENS)).isGreaterThan(0L);
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.GEN_AI_USAGE_OUTPUT_TOKENS)).isGreaterThan(0L);

    // AWS correlation
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.AWS_BEDROCK_AGENTCORE_SESSION_ID))
        .isEqualTo("session-real-bedrock-1");
    assertThat(span.getAttributes().get(GenAiTelemetrySupport.AWS_REQUEST_ID))
        .isEqualTo("req-real-bedrock-1");

    // Masking
    String prompt = span.getEvents().stream()
        .filter(e -> e.getName().equals(GenAiTelemetrySupport.EVENT_GEN_AI_CONTENT_PROMPT))
        .map(e -> e.getAttributes().get(AttributeKey.stringKey("gen_ai.prompt")))
        .findFirst().orElseThrow();

    assertThat(prompt).doesNotContain("jane.doe@acme-corp.com");
    assertThat(prompt).doesNotContain("123-45-6789");
    assertThat(prompt).doesNotContain("4532-0151-1283-0366");
    assertThat(prompt).doesNotContain("AKIAIOSFODNN7EXAMPLE");
    assertThat(prompt).contains("j***@***.com");
    assertThat(prompt).contains("###-##-####");
    assertThat(prompt).contains("4532-****-****-0366");
    assertThat(prompt).containsPattern("\[REDACTED]");
  }
}

Plus a tiny test config that reroutes the OTel exporter to an in-memory one (still wrapped with the masker, so we assert on what would have hit the wire):

package com.example.demo;

import io.opentelemetry.sdk.autoconfigure.spi.AutoConfigurationCustomizerProvider;
import io.opentelemetry.sdk.testing.exporter.InMemorySpanExporter;
import org.springaicommunity.agentcore.observability.masking.PiiMasker;
import org.springaicommunity.agentcore.observability.masking.PiiMaskingSpanExporter;
import org.springframework.boot.test.context.TestConfiguration;
import org.springframework.context.annotation.Bean;

@TestConfiguration
public class InMemoryExporterConfig {

  public static final InMemorySpanExporter SPAN_EXPORTER = InMemorySpanExporter.create();

  @Bean
  AutoConfigurationCustomizerProvider routeToInMemory(PiiMasker piiMasker) {
    return customizer -> customizer.addSpanExporterCustomizer(
        (delegate, unused) -> new PiiMaskingSpanExporter(SPAN_EXPORTER, piiMasker));
  }
}

Run it:

export AWS_REGION=us-east-1
mvn -B test

What real Bedrock returned

Here’s the actual span captured from the live Bedrock call (amazon.nova-lite-v1:0, us-east-1, May 2026):

=== REAL BEDROCK SPAN SUMMARY ===
model               = amazon.nova-lite-v1:0
input_tokens        = 121
output_tokens       = 51
finish_reason       = end_turn
session_id          = session-real-bedrock-1
request_id          = req-real-bedrock-1

masked prompt       =
Customer profile:
Email: j***@***.com
SSN: ###-##-####
Visa: 4532-****-****-0366
Phone: ###-###-####
AWS key: [REDACTED]
OpenAI key: [REDACTED]
Reply with just the word OK.

masked completion   = Sorry, but I cannot respond to a request that might
involve sharing personal information about an individual. ...
================================

Read that output again, slowly. Six pieces of sensitive data went in (email, SSN, Luhn-valid card, phone, AWS key, OpenAI key). All six came out masked. Token counts, finish reason, model name, both AWS correlation IDs are present and correct. Zero hand-instrumentation code.

The completion is interesting too. Bedrock’s safety stack saw the unmasked prompt (because, per the architecture, masking is export-time, not request-time) and refused. That tells us two things at once: the prompt actually reached Bedrock, and Bedrock acted on its real content. Then the masker did its job before anything left the JVM.

What’s behind each masked value

The PII patterns aren’t naive regexes. Here’s what’s running for the credit card mask, which is the trickiest one:

That’s why 1234567890123456 doesn’t get masked: it fails Luhn. Order numbers, tracking IDs, random hashes survive intact. Your false positive rate stays low. I tested this explicitly:

String body = "Tracking number: 1234567890123456. Reply OK.";
// Assertion: prompt event still contains "1234567890123456" verbatim

Passed. The masker doesn’t touch fake-looking 16-digit strings.

For emails the strategy is different: keep the first character of the local part, drop the rest, keep the TLD. jane.doe@acme-corp.com becomes j***@***.com. You preserve aggregate analytics (.com vs .gov vs .mil) while making individual identity unrecoverable. Phones are even simpler: every US format collapses to ###-###-####.

How to think about this in production

Once the starter is in place, you have building blocks. Here are the four most common ways teams actually use them:

1. Per-model token cost dashboards (no custom code)

The aspect records the gen_ai.client.token.usage histogram with gen_ai.token.type=input|output and gen_ai.request.model as dimensions. Point it at OTLP, build a Grafana dashboard, slice by model. Suddenly the question “which model is eating our budget” has a one-query answer.

2. Cross-system request correlation

Both AWS headers (x-amzn-bedrock-agentcore-session-id and x-amzn-request-id) get copied onto every span. When a customer complains, you can:

This is the workflow you want at 2 AM during an incident.

3. Compliance posture, by default

The crucial property: third parties (Datadog, New Relic, your collector) never see unredacted prompts. The architecture makes the bad thing hard to do by accident.

4. Error classification for alerting

The aspect maps exception class names to one of five error.type values:

Now your alerts can fire differently for “Bedrock is throttling us” (capacity issue, escalate to AWS) vs “we have a code bug” (page the on-call dev). One dashboard query, no manual exception classification anywhere.

What it looks like in CI

The library itself ships with a 81-test suite plus JaCoCo coverage gates (96.5% line, 83% branch). I ran mvn verify on a clean clone:

Tests run: 81, Failures: 0, Errors: 0, Skipped: 2
[INFO] All coverage checks have been met.
[INFO] BUILD SUCCESS

The two skipped tests are the live-Bedrock ones, gated behind RUN_REAL_BEDROCK_TESTS=true so CI doesn’t accidentally bill anyone’s account. They cover the same ground as the test I wrote above.

Your own CI strategy probably wants:

Mock the ChatModel in your fast tests. Keep one nightly job that hits live Bedrock to catch behavior drift in the model itself (token counts, finish reasons, PII-handling). Cap that job’s spend with a model + token budget so it can never run away.

Production export targets

Swap otel.traces.exporter=logging for otlp and you’re done:

otel.traces.exporter=otlp
otel.metrics.exporter=otlp
otel.exporter.otlp.endpoint=http://otel-collector:4317
otel.exporter.otlp.protocol=grpc
otel.service.name=my-agent

The masking wrapper still applies. Common targets:

For X-Ray specifically: the gen_ai.* attributes show up as regular span attributes in the X-Ray UI, and the aws.request_id lets you pivot directly to CloudWatch Logs and Bedrock model invocation logging.

Real-world adoption playbook

If you’re rolling this out across multiple agents in an organization, here’s the order I’d recommend:

Step 3 is the one teams skip and regret. The default patterns cover US PII and major card networks. Your business probably has internal account numbers, customer IDs, regional formats (UK NI numbers, EU VAT IDs, Indian Aadhaar) that nobody outside your org has ever heard of. Add them once, in the central starter properties, and you’re done.

Conclusion

Let’s tie this together.

Where we started

Three real production headaches, all caused by the same gap: standard observability tooling was built for HTTP services and databases, not for LLM workloads. Per-request token cost is invisible. Customer complaints can’t be traced back to specific provider calls. Prompts and completions, full of PII, leak into third-party APMs because nobody put a redaction layer in the way. Errors all look the same, so a throttling incident and a code bug page the same on-call engineer.

What we did about it

We walked through spring-ai-agentcore-observability end to end:

Set the foundation. Refreshed what observability with OpenTelemetry actually means – traces, metrics, logs, and the GenAI semantic conventions that finally give LLM workloads a standard vocabulary.
Mapped the architecture. Saw how a single Spring Boot AOP aspect wraps the AgentCore HTTP boundary, enriches spans with the GenAI attributes Spring AI already returns, and hands them to a masking exporter that scrubs PII just before bytes leave the JVM.
Built a working agent. Two dependencies in pom.xml. Three properties in application.properties. One annotated method. That is the whole footprint a developer has to remember.
Validated against real Amazon Bedrock. Posted a prompt loaded with email, SSN, Luhn-valid Visa, US phone, AWS access key, and OpenAI secret key. Confirmed the exported span carries every promised attribute, both AWS correlation IDs, positive token counts from the live model, and that every PII type was redacted before export. The Luhn-failing tracking number survived intact, proving the false-positive rejection works.
Looked at production patterns. Per-model cost dashboards, complaint-to-CloudWatch pivots, compliance-by-default, error classification routing, and a six-step adoption playbook for rolling this across an org.

What you walk away with

Day one - clone the starter, add deps, see spans in stdoutWeek one - point OTLP at your APM, build a token cost dashboardMonth one - alerts on error type and rate per model, custom regex for org PIIQuarter one - all agents standardized, security signs off on prompt captureYear one - drift detection on token deltas, model behavior changes caught in hours not weeks

That trajectory is what good infrastructure looks like – cheap to start, compounding returns over time, no rewrite when requirements grow.

The bigger picture

LLM-backed services are not going to get less complex. Multi-model routing, tool calling, agent-to-agent workflows, fine-tuned variants, on-device inference – all of it is showing up in production this year. Every one of those moves makes the cost-correlation-compliance-reliability problem worse, not better.

The only sustainable answer is observability that speaks the same language as the workload. OpenTelemetry’s GenAI conventions are that language. A starter that emits them automatically, redacts on the way out, and stays out of your way is the right place to spend a Friday afternoon if you have one Spring AI service in production.

What to do next

If you take one thing away, take this: spans are cheap, dashboards compound, and PII leaks are forever. Wire this in before you need it.

A few concrete moves:

Today. Clone the example in this article. Point it at a low-traffic dev environment. Watch a span land in your logs.
This week. Switch the exporter to OTLP and point it at whatever APM your team already uses. Build a per-model token-spend dashboard. Show it to your finance lead.
This month. Add your org-specific custom regex patterns – internal account numbers, regional ID formats, anything your security team would flag in a trace.
This quarter. Roll the same three properties to every Spring AI service you run. Define alerts on error type. Stop writing tracing code by hand.

Resources

Code and docs. github.com/vaquarkhan/spring-ai-agentcore-observability
The original tutorial that this article validated end to end: OBSERVABILITY-TUTORIAL.md
Validation receipts. Every assertion in this article is reproduced in VALIDATION-REPORT.md alongside the source repo – environment, commands, exact span output, and the test that proved each claim.
OpenTelemetry GenAI semantic conventions. opentelemetry.io/docs/specs/semconv/gen-ai/
Spring AI reference. docs.spring.io/spring-ai/reference/
Amazon Bedrock Converse API. docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html

Closing thought

Three years ago, “observability for AI” mostly meant “we logged the prompt.” That era is over. The standards are here, the tooling is here, the GenAI semantic conventions are stable enough to bet on. There is no longer a good reason to ship an LLM service to production without proper telemetry.

The fix is two dependencies and three properties. The downside of skipping it is a finance Slack message you can’t answer, a customer complaint you can’t trace, and a security review you can’t pass. Pick the cheaper option.

Now go check what your Bedrock bill is doing.

If you’re working on agent observability, GenAI semantic conventions, or Bedrock at scale, I’d love to hear what’s broken in your stack. Drop a comment below.

Code: github.com/vaquarkhan/spring-ai-agentcore-observability

:::info
base64 images have been removed. Instead, use an URL or a file from your device

:::

Like 0

Liked Liked