Governance Pipeline

Every request and response flows through the same pipeline. Steps run in order. Each step can inspect, modify, or block — and every decision lands in the audit trail.

Shape of a call

flowchart LR
    A[Agent] -->|tp_ key| G[Gateway]
    G --> I[Input steps]
    I --> P[Policy / OPA]
    P --> C[Credentials]
    C --> L[LLM]
    L --> O[Output steps]
    O --> AU[Audit]
    AU -->|response| A

The box labelled Input steps and Output steps is where the 49 steps live. Policy decides go/no-go using their findings.

Pre-LLM (input side)

Category	Steps	What they catch
Input validation	`validate_input`, `posture_check`	Malformed payloads, unregistered agents
Data protection	`detect_pii`, `detect_secrets`, `pii_tokenize`	Emails, SSNs, card numbers, API keys, tokens
Threat detection	`detect_injection`, `detect_exfiltration`, `detect_escalation`, `detect_flooding`, `detect_social_engineering`, `detect_insider_threat`, `detect_boundary_escape`, `detect_unicode`, `detect_memory_poison`	Prompt injection, data exfiltration, privilege escalation, unicode homoglyphs, memory poisoning
Tool governance	`filter_tools`, `tool_permissions`, `tool_constraints`, `scan_tool_calls`, `verify_tool_governance`, `taint_check`	Unauthorised tools, capability-token violations, taint propagation
Content safety	`content_safety`, `classify_data`, `inspect_images`, `detect_multimodal`	Moderation hits, data classification, image-based prompt injection
Rate limiting	`rate_limit`, `budget_enforcement`, `session_budget`	Per-agent / per-org limits, token and cost caps
Session controls	`session_escalation`, `session_loop_guard`, `loop_guard`	Agents that escalate or loop
Infrastructure	`model_routing`, `detect_infra`, `detect_code_exec`, `shell_bleed`	Model routing rules, infrastructure leakage, code-execution smell

LLM call

call_llm is the single step that forwards the (possibly modified) request through the credential vault to the chosen provider. Your agent never sees the sk_ key.

Post-LLM (output side)

Category	Steps	What they do
Output scanning	`scan_output`, `scan_tool_results`, `redact_tool_results`	Scan model outputs for PII, secrets, policy violations
Integrity	`verify_claims`, `verify_artifact`, `dedup_output`	Verify factual claims, artifact integrity, drop duplicate responses
Restoration	`pii_restore`, `constrain_output`	Put tokenized PII back for authorised consumers, enforce format constraints

How a step decides

Every step produces a detection with a category, severity, and optional modification. The detection is handed to policy (OPA / Rego), which chooses one of four actions:

Action	Meaning
`block`	Reject the request; return a `PolicyBlockError` to the agent
`redact`	Strip the sensitive content in-place; continue
`notify`	Log the finding; continue
`allow`	Skip; treat as clean

Action picking lives in policy, not in the step. That’s why two customers can run the same step set with different behaviours.

Configuration shape

Every step can be tuned per-org:

{
  "steps": {
    "detect_pii": {
      "enabled": true,
      "on_detection": "redact",
      "threshold": 0.8
    },
    "detect_injection": {
      "enabled": true,
      "on_detection": "block",
      "threshold": 0.6
    },
    "rate_limit": {
      "enabled": true,
      "max_calls": 100,
      "window_seconds": 3600
    }
  }
}

Policy presets live in config/policies/{default,alpha,beta,acme}.yaml on the server. Per-customer Rego overrides go in config/policies/rego/.

Capability tokens for tool calls

For agents with tool access, the tool-governance steps verify a capability token — a signed, time-bound credential that names exactly which tools the agent can call and with what constraints.

Property	What it gives you
Ed25519 signature	Offline verification at the tool executor (~27 µs)
Proof-of-Possession	A stolen token is useless without the holder’s private key
Monotonic attenuation	Delegated tokens can only narrow, never widen
Rich constraints	URL-safe paths, regex, range limits, CEL expressions
Delegation chains	Up to 64 levels, cryptographically linked
Trust score	Agent reputation baked into the token

Full spec in Architecture → Capability tokens.

Trust score in the loop

Every agent has a trust score (0–1000) computed from its audit history. Five dimensions:

Dimension	Weight	What it measures
Compliance	30%	Pass rate without blocks
Data safety	25%	PII and secret handling
Security	20%	Threat detection rate
Stability	15%	Behavioural consistency
Efficiency	10%	Cost management

Trust score → clearance level → which tools the capability token can include. A misbehaving agent automatically loses privileged tool access; a clean agent can be given more autonomy over time.

Score	Clearance	Meaning
950–1000	`SYSTEM`	Fully autonomous
800–949	`PRIVILEGED`	Trusted, minimal oversight
600–799	`INTERNAL`	Standard
400–599	`PARTNER`	Limited tool access
200–399	`EXTERNAL`	Restricted, monitored
0–199	`UNTRUSTED`	Observe only

Audit trail

Every pipeline run — every detection, every policy verdict, every LLM call, every tool invocation — writes an event into a hash-chained, signed audit trail. That trail is the source of truth for Compliance Evidence.

What you should do next

Tune per-request behaviour with Governance Flags
Export the audit trail as Compliance Evidence
Read Architecture for the full system diagram, proxy pattern, and token internals