Skip to content

Governance Pipeline

Every request and response flows through the same pipeline. Steps run in order. Each step can inspect, modify, or block — and every decision lands in the audit trail.

flowchart LR
A[Agent] -->|tp_ key| G[Gateway]
G --> I[Input steps]
I --> P[Policy / OPA]
P --> C[Credentials]
C --> L[LLM]
L --> O[Output steps]
O --> AU[Audit]
AU -->|response| A

The box labelled Input steps and Output steps is where the 49 steps live. Policy decides go/no-go using their findings.

CategoryStepsWhat they catch
Input validationvalidate_input, posture_checkMalformed payloads, unregistered agents
Data protectiondetect_pii, detect_secrets, pii_tokenizeEmails, SSNs, card numbers, API keys, tokens
Threat detectiondetect_injection, detect_exfiltration, detect_escalation, detect_flooding, detect_social_engineering, detect_insider_threat, detect_boundary_escape, detect_unicode, detect_memory_poisonPrompt injection, data exfiltration, privilege escalation, unicode homoglyphs, memory poisoning
Tool governancefilter_tools, tool_permissions, tool_constraints, scan_tool_calls, verify_tool_governance, taint_checkUnauthorised tools, capability-token violations, taint propagation
Content safetycontent_safety, classify_data, inspect_images, detect_multimodalModeration hits, data classification, image-based prompt injection
Rate limitingrate_limit, budget_enforcement, session_budgetPer-agent / per-org limits, token and cost caps
Session controlssession_escalation, session_loop_guard, loop_guardAgents that escalate or loop
Infrastructuremodel_routing, detect_infra, detect_code_exec, shell_bleedModel routing rules, infrastructure leakage, code-execution smell

call_llm is the single step that forwards the (possibly modified) request through the credential vault to the chosen provider. Your agent never sees the sk_ key.

CategoryStepsWhat they do
Output scanningscan_output, scan_tool_results, redact_tool_resultsScan model outputs for PII, secrets, policy violations
Integrityverify_claims, verify_artifact, dedup_outputVerify factual claims, artifact integrity, drop duplicate responses
Restorationpii_restore, constrain_outputPut tokenized PII back for authorised consumers, enforce format constraints

Every step produces a detection with a category, severity, and optional modification. The detection is handed to policy (OPA / Rego), which chooses one of four actions:

ActionMeaning
blockReject the request; return a PolicyBlockError to the agent
redactStrip the sensitive content in-place; continue
notifyLog the finding; continue
allowSkip; treat as clean

Action picking lives in policy, not in the step. That’s why two customers can run the same step set with different behaviours.

Every step can be tuned per-org:

{
"steps": {
"detect_pii": {
"enabled": true,
"on_detection": "redact",
"threshold": 0.8
},
"detect_injection": {
"enabled": true,
"on_detection": "block",
"threshold": 0.6
},
"rate_limit": {
"enabled": true,
"max_calls": 100,
"window_seconds": 3600
}
}
}

Policy presets live in config/policies/{default,alpha,beta,acme}.yaml on the server. Per-customer Rego overrides go in config/policies/rego/.

For agents with tool access, the tool-governance steps verify a capability token — a signed, time-bound credential that names exactly which tools the agent can call and with what constraints.

PropertyWhat it gives you
Ed25519 signatureOffline verification at the tool executor (~27 µs)
Proof-of-PossessionA stolen token is useless without the holder’s private key
Monotonic attenuationDelegated tokens can only narrow, never widen
Rich constraintsURL-safe paths, regex, range limits, CEL expressions
Delegation chainsUp to 64 levels, cryptographically linked
Trust scoreAgent reputation baked into the token

Full spec in Architecture → Capability tokens.

Every agent has a trust score (0–1000) computed from its audit history. Five dimensions:

DimensionWeightWhat it measures
Compliance30%Pass rate without blocks
Data safety25%PII and secret handling
Security20%Threat detection rate
Stability15%Behavioural consistency
Efficiency10%Cost management

Trust score → clearance level → which tools the capability token can include. A misbehaving agent automatically loses privileged tool access; a clean agent can be given more autonomy over time.

ScoreClearanceMeaning
950–1000SYSTEMFully autonomous
800–949PRIVILEGEDTrusted, minimal oversight
600–799INTERNALStandard
400–599PARTNERLimited tool access
200–399EXTERNALRestricted, monitored
0–199UNTRUSTEDObserve only

Every pipeline run — every detection, every policy verdict, every LLM call, every tool invocation — writes an event into a hash-chained, signed audit trail. That trail is the source of truth for Compliance Evidence.