Skip to content

Audit and Observability

The audit contract

Every request that passes through a YAAgents gateway produces a structured audit record. The record captures who made the request, for which tenant, on which operation, and what the outcome was. This contract is defined in ADR PI4-yaa-0001 and is implemented consistently across three components:

  • otel-audit gateway plugin — emits at ingress/egress (before and after the upstream call)
  • sdk-go AuditEmitter — emits at response-write time inside the Go service
  • sdk-fastapi AuditEmitter — emits at response-write time inside the Python service

The canonical audit event shape (AuditEvent) has these fields:

FieldSourceExample
event_typePlugin/SDK sets"agentic.request.received"
tenant_idX-Tenant-ID header"tenant-abc"
actor_idJWT sub claim or X-Actor-Principal"usr-xyz"
request_idX-Request-ID header"req-001"
correlation_idX-Correlation-ID header"corr-001"
operationRoute ID"POST /recommendations/{customerId}"
resource_idPath-param value"cust-42"
outcomeSet at response time"success" | "clarification_required" | …
timestampRFC3339 UTC"2026-06-07T19:00:00Z"
attributesFreeform map[string]string{"store": "acme"}

correlation_id is the correlation key. The gateway plugin and the SDK emitter both write this field — operators join records on correlation_id to see the full picture across the gateway layer and the service layer in a single trace.

SDK side (sdk-go + sdk-fastapi)

Both SDKs ship AuditEmitter as an interface/Protocol with a NoopEmitter default. The default is injected at startup — if your service does not configure an emitter, nothing is emitted and no dep is added.

sdk-go (sdk-go/sdkgo/audit.go):

type AuditEmitter interface {
Emit(ctx context.Context, event AuditEvent) error
}
// NoopEmitter is the zero-cost default — zero dependencies, zero side effects.
type NoopEmitter struct{}
func (NoopEmitter) Emit(_ context.Context, _ AuditEvent) error { return nil }

To emit an audit event inside a Go handler:

// ctx already carries the emitter via WithAuditEmitter(ctx, emitter) at startup.
err := sdkgo.AuditEmit(ctx, sdkgo.AuditEvent{
EventType: "agentic.response.written",
Outcome: "success",
ResourceID: customerId,
Attributes: map[string]string{"model": "collab-filter-v2"},
})

sdk-fastapi (sdk-fastapi/src/yaagents_fastapi/audit.py):

class AuditEmitter(Protocol):
async def emit(self, event: AuditEvent) -> None: ...
class NoopEmitter:
async def emit(self, event: AuditEvent) -> None:
pass

To emit inside a FastAPI route decorated with @agentic_route:

from yaagents_fastapi import audit_emit, AuditEvent
await audit_emit(AuditEvent(
event_type="agentic.response.written",
outcome="clarification_required",
resource_id=customer_id,
))

The emitter is propagated via a ContextVar in Python and via context.Context in Go — both approaches are dependency-injection safe and do not require global state.

OTLP exporting is opt-in:

  • Go: add the github.com/ai-mpathyminds/yaagents-sdk-go/audit/otlp sub-module (its own go.mod; core sdk-go stays dep-free)
  • Python: pip install yaagents-fastapi[otel] adds opentelemetry-sdk; plain pip install yaagents-fastapi does not pull it

Gateway side (otel-audit plugin)

The otel-audit plugin emits a span named agentic.request for every request that passes through the gateway. The span covers the full round trip: ingress → upstream call → egress.

Span attributes use the same field names as AuditEvent:

event_type = "agentic.request"
tenant_id = X-Tenant-ID
actor_id = JWT sub (or X-Actor-Principal)
request_id = X-Request-ID
correlation_id = X-Correlation-ID
operation = route ID
outcome = "success" | "error" (set at egress)

Configuration in gateway.yaml:

plugins:
otel-audit:
enabled: true
exporter: stdout # "stdout" (default) | "otlp"
include_request_body: false # set true only when bodies are non-PII
otlp_endpoint: "" # required when exporter: otlp

stdout output (v0.3 baseline — structured JSON, one line per request):

{"level":"INFO","msg":"agentic.request","event_type":"agentic.request.received",
"tenant_id":"tenant-abc","actor_id":"usr-xyz","request_id":"req-001",
"correlation_id":"corr-001","operation":"POST /recommendations/{customerId}",
"outcome":"success","timestamp":"2026-06-07T19:00:00Z"}

Prometheus counters (/metrics): The gateway exposes Prometheus-format metrics at /metrics independently of the otel-audit plugin. These are request-level counters and latency histograms scraped by any standard Prometheus instance:

# Requests counted by route × outcome × tenant
yaagents_requests_total{route="POST /recommendations/{customerId}",outcome="success",tenant_id="tenant-abc"} 42
# End-to-end gateway latency (includes upstream round trip)
yaagents_request_duration_seconds_bucket{route="POST /recommendations/{customerId}",le="0.1"} 38
# Upstream-only latency (gateway processing time excluded)
yaagents_upstream_duration_seconds_bucket{route="POST /recommendations/{customerId}",le="0.05"} 35

Prometheus metrics are available regardless of whether otel-audit is enabled. They are the lightweight always-on signal; otel-audit (stdout or OTLP) is the richer per-request record.

Putting it together

A single recommendation request produces two correlated audit records:

  1. Gateway record (otel-audit plugin, at ingress/egress):

    • event_type: "agentic.request", correlation_id: "corr-001", outcome: "success"
    • Written BEFORE the request body reaches the recommendation service — survives even if the service crashes mid-request.
  2. Service record (sdk-go or sdk-fastapi AuditEmitter, at response-write time):

    • event_type: "agentic.response.written", correlation_id: "corr-001", outcome: "success", resource_id: "cust-42", attributes: {"model": "collab-filter-v2"}
    • Written inside the service handler — carries service-level detail the gateway cannot know.

Join on correlation_id to see the full picture:

corr-001 | gateway → otel-audit | ingress 19:00:00.000 → egress 19:00:00.048 | outcome=success
corr-001 | service → sdk-go | response written 19:00:00.044 | model=collab-filter-v2

The gateway record gives timing and tenant context; the service record gives business context. Together they satisfy a non-repudiation requirement without coupling the gateway to service internals.

v0.3 baseline (stdout) vs v0.4 OTLP

v0.3 baseline — three observability channels, all zero-config:

  • Stdout log lines (otel-audit plugin): structured JSON, one line per gateway-proxied request.
  • X-Correlation-ID propagation: gateway assigns a correlation ID to every request and forwards it to the upstream; log lines include correlation_id for manual trace joining.
  • Prometheus counters (/metrics): request totals and latency histograms, available for scraping by any standard Prometheus agent.

Neither sdk-go nor sdk-fastapi had AuditEmitter in v0.3; service-level audit was manual.

v0.4 additions:

  • SDK AuditEmitter interface + NoopEmitter default shipped in both sdk-go and sdk-fastapi (Goal 1, ADR PI4-yaa-0001)
  • OTLP exporter added as opt-in to the otel-audit plugin (exporter: otlp)
  • SDK OTLP exporter available as an opt-in sub-module/extra

The stdout exporter remains the default in v0.4 — operators that do not need a collector have zero new infrastructure requirements.

Performance impact at BENCH-5 baselines (p99):

  • exporter: stdout: +36.8 ms vs no-plugin baseline
  • exporter: otlp: +5.9 ms additional vs stdout (collector absorbs the I/O)

PI5-yaa forward link: structured observability dashboard examples (Grafana / Tempo) for OTLP traces are planned for the next increment. Community contributions for Grafana dashboard JSON are welcome via the #observability discussion.

Privacy and PII boundary

The include_request_body: false default is intentional. Request bodies frequently contain PII — customer IDs, product preferences, health data, financial information — and writing those to a log or trace store without scrubbing can create GDPR/PDPA compliance problems.

AttributePII riskGuidance
tenant_idSemi-sensitive (opaque ID, not a personal name)Safe to include
actor_idSemi-sensitive (opaque ID, not an email)Safe to include
request_id / correlation_idNo PIISafe to include
operationNo PIISafe to include
outcomeNo PIISafe to include
request_body (opt-in)HIGH risk — often contains PIIEnable only after legal review
attributes (freeform)Caller responsibilityDo not write secrets or personal data