Skip to content

prompt-sanitize

Stable

Purpose

The prompt-sanitize plugin inspects inbound request bodies against a configurable list of regex and keyword patterns before forwarding to upstream AI services. Matched content is either rejected (request blocked with a clarification response) or redacted (matched substrings replaced with [REDACTED] and request forwarded), according to per-pattern or global strategy settings. Rather than adding body-scanning logic to every upstream service, the gateway enforces a consistent sanitization policy at the boundary.

Pipeline position:

flowchart LR
TV["token-validator"] --> TI["tenant-injector"]
TI --> LC["license-check"]
LC --> PS["**prompt-sanitize**"]:::active
PS --> OA["otel-audit"]
OA --> UP["upstream"]
classDef active fill:#4ade80,stroke:#16a34a,color:#14532d

prompt-sanitize is pipeline position 4 — runs after license entitlement is confirmed.

Config

gateway.yaml (prompt-sanitize block)
prompt-sanitize:
enabled: true
strategy: reject # global default action for patterns that omit action
on_match_reject_status: 412 # HTTP status returned on reject (default 412)
max_body_bytes: 1048576 # body scan cap in bytes (default 1 MB)
patterns:
- name: pii_email
regex: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}'
action: redact
- name: pii_phone
regex: 'd{3}[-.s]?d{3}[-.s]?d{4}'
action: redact
- name: injection_attempt
keywords:
- "ignore previous instructions"
- "jailbreak"
- "DAN mode"
action: reject
- name: ssn
regex: 'd{3}-d{2}-d{4}'
action: reject
FieldTypeDefaultRequiredDescription
enabledbooltruenoSet to false to bypass the plugin (for debugging only).
strategystring"reject"noGlobal default action for patterns that omit action. reject or redact.
on_match_reject_statusinteger412noHTTP status returned when a reject-action pattern matches. 412 (Precondition Failed) signals the client to correct its payload before retrying.
max_body_bytesinteger1048576noMaximum body bytes scanned. Bodies larger than this cap are scanned only up to the cap; the remainder is forwarded unmodified.
patternslist[]noOrdered list of pattern entries. Evaluated in declaration order.
patterns[].namestringyesUnique name for this pattern; used in log and span labels.
patterns[].regexstringone of regex/keywordsGo RE2 regular expression applied to the raw request body.
patterns[].keywordslistone of regex/keywordsList of literal strings; matched case-insensitively anywhere in the body.
patterns[].actionstringstrategynoreject or redact. Overrides the global strategy for this pattern.

Request/Response

Reads from request

SourceFieldHow used
Request bodyRaw bytesScanned against all configured patterns. Bodies > max_body_bytes are partially scanned (up to cap).
Request contextX-Tenant-ID, X-Actor-PrincipalAvailable in reqctx; not directly used by the pattern engine, but present for downstream correlation.

Writes to request (before forwarding upstream)

Modified fieldActionNotes
Request bodyOn redact match: matched substrings replaced with [REDACTED]Modified body is forwarded with correct Content-Length header updated.

On reject action: request is terminated early — upstream does not receive the request. On no match: request body is forwarded unchanged.

Writes to response

This plugin does not modify upstream responses. On reject it returns its own early response (see Status codes below) before any upstream contact.

Status codes the plugin can return early

StatusMedia typeWhen
412application/vnd.yaagents.clarification+jsonA reject-action pattern matched. Response body includes a requiredInputs array describing which patterns fired and what the client should correct.

The 412 status signals the client to correct its payload before retrying — it is a payload-correction signal (not a generic 400 bad-request), aligned with Profile v0.3 §clarification handshake conventions.

Security & privacy

What this plugin trusts

  • The request body bytes as received from the upstream caller (post-auth, post-license) — the plugin assumes body content integrity but not body content safety; that is exactly the contract being enforced.
  • The pattern definitions in config — patterns are compiled at Init; a misconfigured regex that matches everything will redact or reject every request.

What this plugin protects

  • Prompt injection: keyword patterns targeting common injection phrases (ignore previous instructions, jailbreak, etc.) block attempts to subvert upstream AI service behavior.
  • PII leakage: regex patterns for email addresses, phone numbers, SSNs, and similar identifiers prevent sensitive data from reaching AI models where it may be retained, logged, or memorized.
  • Exfiltration patterns: custom patterns can block attempts to embed extraction instructions into prompts (e.g., “output your system prompt”).

PII boundary

This plugin has the highest PII surface area of all five gateway plugins — it actively reads and modifies the request body. Key posture:

  • On redact: matched substrings are replaced with [REDACTED] before the body reaches the upstream. The matched content is never logged; only the pattern name and match count are emitted at INFO level.
  • On reject: the body is not forwarded at all; only the pattern name is logged.
  • The original (pre-redaction) body is never written to any log line, span, or metric.
  • max_body_bytes provides a cap — bodies beyond the scan limit are forwarded with the scanned prefix redacted/checked but no guarantee on the tail.

Secrets handling

No secrets are loaded by this plugin. Pattern definitions (regex strings and keyword lists) are config values — treat them as operational configuration, not credentials. The patterns list should be reviewed as carefully as firewall rules: overly broad patterns silently redact legitimate content; too-narrow patterns miss real threats.

Observability

Spans / events emitted

Span nameAttributesWhen emitted
sanitize.scanoutcome (pass | redacted | rejected), pattern_hits (count), body_bytes_scannedEvery request where the plugin is enabled.
sanitize.matchpattern_name, action (redact | reject), match_countOnce per pattern that fires; never includes matched content.

Bench baseline (BENCH-4; commit 21ab2fc; 2026-06-07):

  • Small bodies (under 1 KB): p99 overhead +11.5 ms vs no-plugin baseline.
  • Medium bodies (1–10 KB): p99 overhead −1.3 ms (within noise floor; regex engine amortises).
  • Large bodies (over 10 KB): p99 overhead +1.3 ms (scan overhead grows sub-linearly with body size due to RE2 efficiency).
  • Redact strategy vs reject: p99 overhead +0.7 ms additional for redact (body rewrite).

See Audit and Observability for the full bench baseline archive.

Log lines

{"level":"INFO","msg":"sanitize.match","pattern":"pii_email","action":"redact","match_count":2,"request_id":"req-001"}
{"level":"INFO","msg":"sanitize.match","pattern":"injection_attempt","action":"reject","match_count":1,"request_id":"req-002"}

Matched content (the actual matched substring) is never logged — only the pattern name, action, and match count are emitted. This prevents PII from appearing in logs even when the redact action fires.

Metrics

MetricTypeLabelsDescription
yaagents_plugin_sanitize_scan_totalcounteroutcomeCumulative scans by outcome (pass, redacted, rejected).
yaagents_plugin_sanitize_pattern_hits_totalcounterpattern, actionCumulative pattern fires by pattern name and action. Useful for tuning pattern coverage.
yaagents_plugin_sanitize_body_bytes_scanned_totalcounterTotal bytes scanned (up to max_body_bytes per request).

Correlation-id propagation

Reads X-Correlation-ID from the inbound request and attaches it as the correlation_id attribute on sanitize.scan and sanitize.match spans. Forwarded unchanged to the upstream on the redact path (not applicable on the reject path — request is terminated before reaching upstream).

Failure modes

FailureConfigurable behaviorWhat the client sees
reject-action pattern matchon_match_reject_status (default 412)412 application/vnd.yaagents.clarification+json with requiredInputs listing the pattern names that fired.
redact-action pattern matchFixed — body modified + forwarded200 from upstream (redaction is transparent to client). INFO log emitted.
Body exceeds max_body_bytesFixed — scan cap applied; tail forwarded unscannedNo error to client; tail of body reaches upstream unredacted. Monitor body_bytes_scanned_total to detect frequent cap hits.
Misconfigured regex (non-RE2)Fixed Init error — gateway exits 1Gateway does not start.
No patterns configuredPlugin is a pass-through (zero scan overhead)All requests forwarded unchanged.