prompt-sanitize

Stable

Purpose

The prompt-sanitize plugin inspects inbound request bodies against a configurable list of regex and keyword patterns before forwarding to upstream AI services. Matched content is either rejected (request blocked with a clarification response) or redacted (matched substrings replaced with [REDACTED] and request forwarded), according to per-pattern or global strategy settings. Rather than adding body-scanning logic to every upstream service, the gateway enforces a consistent sanitization policy at the boundary.

Pipeline position:

flowchart LR
  TV["token-validator"] --> TI["tenant-injector"]
  TI --> LC["license-check"]
  LC --> PS["**prompt-sanitize**"]:::active
  PS --> OA["otel-audit"]
  OA --> UP["upstream"]

  classDef active fill:#4ade80,stroke:#16a34a,color:#14532d

prompt-sanitize is pipeline position 4 — runs after license entitlement is confirmed.

Config

prompt-sanitize:
enabled: true
strategy: reject                  # global default action for patterns that omit action
on_match_reject_status: 412       # HTTP status returned on reject (default 412)
max_body_bytes: 1048576           # body scan cap in bytes (default 1 MB)
patterns:
  - name: pii_email
    regex: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}'
    action: redact
  - name: pii_phone
    regex: 'd{3}[-.s]?d{3}[-.s]?d{4}'
    action: redact
  - name: injection_attempt
    keywords:
      - "ignore previous instructions"
      - "jailbreak"
      - "DAN mode"
    action: reject
  - name: ssn
    regex: 'd{3}-d{2}-d{4}'
    action: reject

Field	Type	Default	Required	Description
`enabled`	bool	`true`	no	Set to `false` to bypass the plugin (for debugging only).
`strategy`	string	`"reject"`	no	Global default action for patterns that omit `action`. `reject` or `redact`.
`on_match_reject_status`	integer	`412`	no	HTTP status returned when a `reject`-action pattern matches. 412 (Precondition Failed) signals the client to correct its payload before retrying.
`max_body_bytes`	integer	`1048576`	no	Maximum body bytes scanned. Bodies larger than this cap are scanned only up to the cap; the remainder is forwarded unmodified.
`patterns`	list	`[]`	no	Ordered list of pattern entries. Evaluated in declaration order.
`patterns[].name`	string	—	yes	Unique name for this pattern; used in log and span labels.
`patterns[].regex`	string	—	one of regex/keywords	Go RE2 regular expression applied to the raw request body.
`patterns[].keywords`	list	—	one of regex/keywords	List of literal strings; matched case-insensitively anywhere in the body.
`patterns[].action`	string	`strategy`	no	`reject` or `redact`. Overrides the global `strategy` for this pattern.

Request/Response

Reads from request

Source	Field	How used
Request body	Raw bytes	Scanned against all configured patterns. Bodies > `max_body_bytes` are partially scanned (up to cap).
Request context	`X-Tenant-ID`, `X-Actor-Principal`	Available in reqctx; not directly used by the pattern engine, but present for downstream correlation.

Writes to request (before forwarding upstream)

Modified field	Action	Notes
Request body	On `redact` match: matched substrings replaced with `[REDACTED]`	Modified body is forwarded with correct `Content-Length` header updated.

On reject action: request is terminated early — upstream does not receive the request. On no match: request body is forwarded unchanged.

Writes to response

This plugin does not modify upstream responses. On reject it returns its own early response (see Status codes below) before any upstream contact.

Status codes the plugin can return early

Status	Media type	When
`412`	`application/vnd.yaagents.clarification+json`	A `reject`-action pattern matched. Response body includes a `requiredInputs` array describing which patterns fired and what the client should correct.

The 412 status signals the client to correct its payload before retrying — it is a payload-correction signal (not a generic 400 bad-request), aligned with Profile v0.3 §clarification handshake conventions.

Security & privacy

What this plugin trusts

The request body bytes as received from the upstream caller (post-auth, post-license) — the plugin assumes body content integrity but not body content safety; that is exactly the contract being enforced.
The pattern definitions in config — patterns are compiled at Init; a misconfigured regex that matches everything will redact or reject every request.

What this plugin protects

Prompt injection: keyword patterns targeting common injection phrases (ignore previous instructions, jailbreak, etc.) block attempts to subvert upstream AI service behavior.
PII leakage: regex patterns for email addresses, phone numbers, SSNs, and similar identifiers prevent sensitive data from reaching AI models where it may be retained, logged, or memorized.
Exfiltration patterns: custom patterns can block attempts to embed extraction instructions into prompts (e.g., “output your system prompt”).

PII boundary

This plugin has the highest PII surface area of all five gateway plugins — it actively reads and modifies the request body. Key posture:

On redact: matched substrings are replaced with [REDACTED] before the body reaches the upstream. The matched content is never logged; only the pattern name and match count are emitted at INFO level.
On reject: the body is not forwarded at all; only the pattern name is logged.
The original (pre-redaction) body is never written to any log line, span, or metric.
max_body_bytes provides a cap — bodies beyond the scan limit are forwarded with the scanned prefix redacted/checked but no guarantee on the tail.

Secrets handling

No secrets are loaded by this plugin. Pattern definitions (regex strings and keyword lists) are config values — treat them as operational configuration, not credentials. The patterns list should be reviewed as carefully as firewall rules: overly broad patterns silently redact legitimate content; too-narrow patterns miss real threats.

Observability

Spans / events emitted

Span name	Attributes	When emitted
`sanitize.scan`	`outcome` (`pass` \| `redacted` \| `rejected`), `pattern_hits` (count), `body_bytes_scanned`	Every request where the plugin is enabled.
`sanitize.match`	`pattern_name`, `action` (`redact` \| `reject`), `match_count`	Once per pattern that fires; never includes matched content.

Bench baseline (BENCH-4; commit 21ab2fc; 2026-06-07):

Small bodies (under 1 KB): p99 overhead +11.5 ms vs no-plugin baseline.
Medium bodies (1–10 KB): p99 overhead −1.3 ms (within noise floor; regex engine amortises).
Large bodies (over 10 KB): p99 overhead +1.3 ms (scan overhead grows sub-linearly with body size due to RE2 efficiency).
Redact strategy vs reject: p99 overhead +0.7 ms additional for redact (body rewrite).

See Audit and Observability for the full bench baseline archive.

Log lines

{"level":"INFO","msg":"sanitize.match","pattern":"pii_email","action":"redact","match_count":2,"request_id":"req-001"}
{"level":"INFO","msg":"sanitize.match","pattern":"injection_attempt","action":"reject","match_count":1,"request_id":"req-002"}

Matched content (the actual matched substring) is never logged — only the pattern name, action, and match count are emitted. This prevents PII from appearing in logs even when the redact action fires.

Metrics

Metric	Type	Labels	Description
`yaagents_plugin_sanitize_scan_total`	counter	`outcome`	Cumulative scans by outcome (`pass`, `redacted`, `rejected`).
`yaagents_plugin_sanitize_pattern_hits_total`	counter	`pattern`, `action`	Cumulative pattern fires by pattern name and action. Useful for tuning pattern coverage.
`yaagents_plugin_sanitize_body_bytes_scanned_total`	counter	—	Total bytes scanned (up to `max_body_bytes` per request).

Correlation-id propagation

Reads X-Correlation-ID from the inbound request and attaches it as the correlation_id attribute on sanitize.scan and sanitize.match spans. Forwarded unchanged to the upstream on the redact path (not applicable on the reject path — request is terminated before reaching upstream).

Failure modes

Failure	Configurable behavior	What the client sees
`reject`-action pattern match	`on_match_reject_status` (default 412)	`412 application/vnd.yaagents.clarification+json` with `requiredInputs` listing the pattern names that fired.
`redact`-action pattern match	Fixed — body modified + forwarded	`200` from upstream (redaction is transparent to client). INFO log emitted.
Body exceeds `max_body_bytes`	Fixed — scan cap applied; tail forwarded unscanned	No error to client; tail of body reaches upstream unredacted. Monitor `body_bytes_scanned_total` to detect frequent cap hits.
Misconfigured regex (non-RE2)	Fixed Init error — gateway exits 1	Gateway does not start.
No patterns configured	Plugin is a pass-through (zero scan overhead)	All requests forwarded unchanged.