Policy Engine
Policy engine
How Guardion turns raw detector results and accumulated session context into a single, explainable decision — allow, flag, deny or modify.
Detection answers *“is there a risk in this content?”*. The policy engine answers *“what should happen?”*. It combines three inputs — what the detectors found, how each guardrail is configured, and the session's accumulated risk — into one decision returned on every POST /v1/guard call.
The four decisions
Every evaluation resolves to exactly one decision. The response fields flagged, deny and redacted are derived from it.
| Decision | Meaning | `flagged` | `deny` | `redacted` |
|---|---|---|---|---|
| ALLOW | Clean — nothing tripped. | false | false | false |
| FLAG | Risk detected; allow the request but surface the signal. | true | false | — |
| DENY | Risk detected; reject the request. | true | true | — |
| MODIFY | Content cleaned (e.g. redacted); request proceeds. | false | false | true |
Two ingredients — the guardrail action and the policy action
Each guardrail declares what should happen when it fires — its *expected* action. This is the per-detector signal:
| Guardrail action | Effect when it fires |
|---|---|
deny | Treat as blocking — follows the policy action. |
follow | Follows the policy action (block → deny, flag → flag). |
async (redact) | Redact-only — clean the content, never blocks. |
pass | Flag-only — surface the signal, never blocks. |
The policy sets the enforcement mode via its action, which is the master switch over all blocking guardrails:
| Policy action | Effect |
|---|---|
block | Enforce: a blocking guardrail produces DENY. |
flag | Monitor: nothing is denied — the strongest outcome is FLAG. |
How a decision is reached
The engine evaluates signals in precedence order and the first match wins:
1. Session rules. Rules evaluated over accumulated session context (risk score, repeated flags, intent drift, bot signals). If a rule matches, it decides. See Session risk & rules.
2. Redact-only guardrails → MODIFY. If only async/redact guardrails fired (no blocking guardrail alongside), the content is cleaned and the request proceeds — independent of the policy action.
3. Detection → policy action. Any other detection follows the policy action: DENY on a block policy, FLAG on a flag policy.
4. Clean → ALLOW.
Because deny only ever results from a block policy, a flag policy can never reject a request — it is a hard cap at FLAG. See worked examples in Decision matrix.
Why a decision was made
Each decision is explainable: the response indicates the deciding factor (the matched rule or the firing guardrails) and which content was redacted, so you can audit exactly why a request was allowed, flagged, denied or modified.