Policy Engine

Policy engine

How Guardion turns raw detector results and accumulated session context into a single, explainable decision — allow, flag, deny or modify.

Detection answers *“is there a risk in this content?”*. The policy engine answers *“what should happen?”*. It combines three inputs — what the detectors found, how each guardrail is configured, and the session's accumulated risk — into one decision returned on every POST /v1/guard call.

The four decisions

Every evaluation resolves to exactly one decision. The response fields flagged, deny and redacted are derived from it.

Decision	Meaning	`flagged`	`deny`	`redacted`
ALLOW	Clean — nothing tripped.	false	false	false
FLAG	Risk detected; allow the request but surface the signal.	true	false	—
DENY	Risk detected; reject the request.	true	true	—
MODIFY	Content cleaned (e.g. redacted); request proceeds.	false	false	true

Two ingredients — the guardrail action and the policy action

Each guardrail declares what should happen when it fires — its *expected* action. This is the per-detector signal:

Guardrail action	Effect when it fires
`deny`	Treat as blocking — follows the policy action.
`follow`	Follows the policy action (block → deny, flag → flag).
`async` (redact)	Redact-only — clean the content, never blocks.
`pass`	Flag-only — surface the signal, never blocks.

The policy sets the enforcement mode via its action, which is the master switch over all blocking guardrails:

Policy action	Effect
`block`	Enforce: a blocking guardrail produces DENY.
`flag`	Monitor: nothing is denied — the strongest outcome is FLAG.

How a decision is reached

The engine evaluates signals in precedence order and the first match wins:

1. Session rules. Rules evaluated over accumulated session context (risk score, repeated flags, intent drift, bot signals). If a rule matches, it decides. See Session risk & rules.

2. Redact-only guardrails → MODIFY. If only async/redact guardrails fired (no blocking guardrail alongside), the content is cleaned and the request proceeds — independent of the policy action.

3. Detection → policy action. Any other detection follows the policy action: DENY on a block policy, FLAG on a flag policy.

4. Clean → ALLOW.

Because deny only ever results from a block policy, a flag policy can never reject a request — it is a hard cap at FLAG. See worked examples in Decision matrix.

Why a decision was made

Each decision is explainable: the response indicates the deciding factor (the matched rule or the firing guardrails) and which content was redacted, so you can audit exactly why a request was allowed, flagged, denied or modified.