Policy Engine

Policy engine

How Guardion turns raw detector results and accumulated session context into a single, explainable decision — allow, flag, deny or modify.

Detection answers *“is there a risk in this content?”*. The policy engine answers *“what should happen?”*. It combines three inputs — what the detectors found, how each guardrail is configured, and the session's accumulated risk — into one decision returned on every POST /v1/guard call.

The four decisions

Every evaluation resolves to exactly one decision. The response fields flagged, deny and redacted are derived from it.

DecisionMeaning`flagged``deny``redacted`
ALLOWClean — nothing tripped.falsefalsefalse
FLAGRisk detected; allow the request but surface the signal.truefalse
DENYRisk detected; reject the request.truetrue
MODIFYContent cleaned (e.g. redacted); request proceeds.falsefalsetrue

Two ingredients — the guardrail action and the policy action

Each guardrail declares what should happen when it fires — its *expected* action. This is the per-detector signal:

Guardrail actionEffect when it fires
denyTreat as blocking — follows the policy action.
followFollows the policy action (block → deny, flag → flag).
async (redact)Redact-only — clean the content, never blocks.
passFlag-only — surface the signal, never blocks.

The policy sets the enforcement mode via its action, which is the master switch over all blocking guardrails:

Policy actionEffect
blockEnforce: a blocking guardrail produces DENY.
flagMonitor: nothing is denied — the strongest outcome is FLAG.

How a decision is reached

The engine evaluates signals in precedence order and the first match wins:

1. Session rules. Rules evaluated over accumulated session context (risk score, repeated flags, intent drift, bot signals). If a rule matches, it decides. See Session risk & rules.

2. Redact-only guardrails → MODIFY. If only async/redact guardrails fired (no blocking guardrail alongside), the content is cleaned and the request proceeds — independent of the policy action.

3. Detection → policy action. Any other detection follows the policy action: DENY on a block policy, FLAG on a flag policy.

4. Clean → ALLOW.

Because deny only ever results from a block policy, a flag policy can never reject a request — it is a hard cap at FLAG. See worked examples in Decision matrix.

Why a decision was made

Each decision is explainable: the response indicates the deciding factor (the matched rule or the firing guardrails) and which content was redacted, so you can audit exactly why a request was allowed, flagged, denied or modified.