Audit and Observability
The audit contract
Every request that passes through a YAAgents gateway produces a structured audit record. The record captures who made the request, for which tenant, on which operation, and what the outcome was. This contract is defined in ADR PI4-yaa-0001 and is implemented consistently across three components:
- otel-audit gateway plugin — emits at ingress/egress (before and after the upstream call)
- sdk-go
AuditEmitter— emits at response-write time inside the Go service - sdk-fastapi
AuditEmitter— emits at response-write time inside the Python service
The canonical audit event shape (AuditEvent) has these fields:
| Field | Source | Example |
|---|---|---|
event_type | Plugin/SDK sets | "agentic.request.received" |
tenant_id | X-Tenant-ID header | "tenant-abc" |
actor_id | JWT sub claim or X-Actor-Principal | "usr-xyz" |
request_id | X-Request-ID header | "req-001" |
correlation_id | X-Correlation-ID header | "corr-001" |
operation | Route ID | "POST /recommendations/{customerId}" |
resource_id | Path-param value | "cust-42" |
outcome | Set at response time | "success" | "clarification_required" | … |
timestamp | RFC3339 UTC | "2026-06-07T19:00:00Z" |
attributes | Freeform map[string]string | {"store": "acme"} |
correlation_id is the correlation key. The gateway plugin and the SDK emitter both write this
field — operators join records on correlation_id to see the full picture across the gateway
layer and the service layer in a single trace.
SDK side (sdk-go + sdk-fastapi)
Both SDKs ship AuditEmitter as an interface/Protocol with a NoopEmitter default. The
default is injected at startup — if your service does not configure an emitter, nothing is
emitted and no dep is added.
sdk-go (sdk-go/sdkgo/audit.go):
type AuditEmitter interface { Emit(ctx context.Context, event AuditEvent) error}
// NoopEmitter is the zero-cost default — zero dependencies, zero side effects.type NoopEmitter struct{}func (NoopEmitter) Emit(_ context.Context, _ AuditEvent) error { return nil }To emit an audit event inside a Go handler:
// ctx already carries the emitter via WithAuditEmitter(ctx, emitter) at startup.err := sdkgo.AuditEmit(ctx, sdkgo.AuditEvent{ EventType: "agentic.response.written", Outcome: "success", ResourceID: customerId, Attributes: map[string]string{"model": "collab-filter-v2"},})sdk-fastapi (sdk-fastapi/src/yaagents_fastapi/audit.py):
class AuditEmitter(Protocol): async def emit(self, event: AuditEvent) -> None: ...
class NoopEmitter: async def emit(self, event: AuditEvent) -> None: passTo emit inside a FastAPI route decorated with @agentic_route:
from yaagents_fastapi import audit_emit, AuditEvent
await audit_emit(AuditEvent( event_type="agentic.response.written", outcome="clarification_required", resource_id=customer_id,))The emitter is propagated via a ContextVar in Python and via context.Context in Go — both
approaches are dependency-injection safe and do not require global state.
OTLP exporting is opt-in:
- Go: add the
github.com/ai-mpathyminds/yaagents-sdk-go/audit/otlpsub-module (its owngo.mod; core sdk-go stays dep-free) - Python:
pip install yaagents-fastapi[otel]addsopentelemetry-sdk; plainpip install yaagents-fastapidoes not pull it
Gateway side (otel-audit plugin)
The otel-audit plugin emits a span named agentic.request
for every request that passes through the gateway. The span covers the full round trip:
ingress → upstream call → egress.
Span attributes use the same field names as AuditEvent:
event_type = "agentic.request"tenant_id = X-Tenant-IDactor_id = JWT sub (or X-Actor-Principal)request_id = X-Request-IDcorrelation_id = X-Correlation-IDoperation = route IDoutcome = "success" | "error" (set at egress)Configuration in gateway.yaml:
plugins: otel-audit: enabled: true exporter: stdout # "stdout" (default) | "otlp" include_request_body: false # set true only when bodies are non-PII otlp_endpoint: "" # required when exporter: otlpstdout output (v0.3 baseline — structured JSON, one line per request):
{"level":"INFO","msg":"agentic.request","event_type":"agentic.request.received", "tenant_id":"tenant-abc","actor_id":"usr-xyz","request_id":"req-001", "correlation_id":"corr-001","operation":"POST /recommendations/{customerId}", "outcome":"success","timestamp":"2026-06-07T19:00:00Z"}Prometheus counters (/metrics): The gateway exposes Prometheus-format metrics at /metrics
independently of the otel-audit plugin. These are request-level counters and latency histograms
scraped by any standard Prometheus instance:
# Requests counted by route × outcome × tenantyaagents_requests_total{route="POST /recommendations/{customerId}",outcome="success",tenant_id="tenant-abc"} 42
# End-to-end gateway latency (includes upstream round trip)yaagents_request_duration_seconds_bucket{route="POST /recommendations/{customerId}",le="0.1"} 38
# Upstream-only latency (gateway processing time excluded)yaagents_upstream_duration_seconds_bucket{route="POST /recommendations/{customerId}",le="0.05"} 35Prometheus metrics are available regardless of whether otel-audit is enabled. They are the lightweight always-on signal; otel-audit (stdout or OTLP) is the richer per-request record.
Putting it together
A single recommendation request produces two correlated audit records:
-
Gateway record (otel-audit plugin, at ingress/egress):
event_type: "agentic.request",correlation_id: "corr-001",outcome: "success"- Written BEFORE the request body reaches the recommendation service — survives even if the service crashes mid-request.
-
Service record (sdk-go or sdk-fastapi AuditEmitter, at response-write time):
event_type: "agentic.response.written",correlation_id: "corr-001",outcome: "success",resource_id: "cust-42",attributes: {"model": "collab-filter-v2"}- Written inside the service handler — carries service-level detail the gateway cannot know.
Join on correlation_id to see the full picture:
corr-001 | gateway → otel-audit | ingress 19:00:00.000 → egress 19:00:00.048 | outcome=successcorr-001 | service → sdk-go | response written 19:00:00.044 | model=collab-filter-v2The gateway record gives timing and tenant context; the service record gives business context. Together they satisfy a non-repudiation requirement without coupling the gateway to service internals.
v0.3 baseline (stdout) vs v0.4 OTLP
v0.3 baseline — three observability channels, all zero-config:
- Stdout log lines (otel-audit plugin): structured JSON, one line per gateway-proxied request.
X-Correlation-IDpropagation: gateway assigns a correlation ID to every request and forwards it to the upstream; log lines includecorrelation_idfor manual trace joining.- Prometheus counters (
/metrics): request totals and latency histograms, available for scraping by any standard Prometheus agent.
Neither sdk-go nor sdk-fastapi had AuditEmitter in v0.3; service-level audit was manual.
v0.4 additions:
- SDK
AuditEmitterinterface +NoopEmitterdefault shipped in both sdk-go and sdk-fastapi (Goal 1, ADR PI4-yaa-0001) - OTLP exporter added as opt-in to the otel-audit plugin (
exporter: otlp) - SDK OTLP exporter available as an opt-in sub-module/extra
The stdout exporter remains the default in v0.4 — operators that do not need a collector have zero new infrastructure requirements.
Performance impact at BENCH-5 baselines (p99):
exporter: stdout: +36.8 ms vs no-plugin baselineexporter: otlp: +5.9 ms additional vs stdout (collector absorbs the I/O)
PI5-yaa forward link: structured observability dashboard examples (Grafana / Tempo) for OTLP traces are planned for the next increment. Community contributions for Grafana dashboard JSON are welcome via the #observability discussion.
Privacy and PII boundary
The include_request_body: false default is intentional. Request bodies frequently contain
PII — customer IDs, product preferences, health data, financial information — and writing
those to a log or trace store without scrubbing can create GDPR/PDPA compliance problems.
| Attribute | PII risk | Guidance |
|---|---|---|
tenant_id | Semi-sensitive (opaque ID, not a personal name) | Safe to include |
actor_id | Semi-sensitive (opaque ID, not an email) | Safe to include |
request_id / correlation_id | No PII | Safe to include |
operation | No PII | Safe to include |
outcome | No PII | Safe to include |
request_body (opt-in) | HIGH risk — often contains PII | Enable only after legal review |
attributes (freeform) | Caller responsibility | Do not write secrets or personal data |