Safety & Quality Overview

Seclai provides three distinct layers that protect your AI pipelines from different categories of risk. Each layer operates at a different point in the pipeline, uses a different evaluation technique, and catches a different type of problem.

Three Layers of Protection

Prompt Scanning — Blocks prompt injection and jailbreaking attacks before content reaches any LLM.
Governance Policies — Screens content against your safety, privacy, and compliance rules, flagging or blocking violations.
Agent Evaluations — Validates that agent outputs meet your quality criteria, with optional auto-retry.

These layers are complementary. Prompt scanning prevents malicious inputs from entering the system. Governance ensures content complies with your policies. Agent evaluations confirm the output is useful and correct.

Prompt Scanning

Problem it solves: Adversarial users or compromised data sources attempt to hijack your agents via prompt injection or jailbreaking techniques.

How it works: A dedicated ML classifier (not an LLM) analyzes all incoming text at every platform ingress point, and also scans the outputs of steps that fetch data from external sources (Web Fetch, Web Search, Webhook Call). Malicious content is blocked instantly — before it is forwarded to any LLM, indexed for retrieval, or returned to users.

Key characteristics:

Always on, no configuration required
Sub-second latency, zero LLM cost
Runs before any other processing
Binary outcome: safe or blocked

→ Full documentation: Prompt Scanner

Governance Policies

Problem it solves: Agent outputs or ingested content may contain harmful, sensitive, or non-compliant material that violates your organization's policies — even when the input was perfectly benign.

How it works: You define natural-language policies describing what content is acceptable. An LLM evaluates content against these policies at configured screening points (agent input, step output, source content). Violations are flagged for review or blocked outright depending on your settings.

Key characteristics:

Configurable policies with natural-language descriptions
LLM-based evaluation (uses credits)
Screens both inputs and outputs
Three verdicts: pass, flag, or block
Configurable blocking mode (sync gate vs async audit)
Scoping hierarchy: account → agent → step → source

→ Full documentation: Governance

Agent Evaluations

Problem it solves: An agent's output may be technically "safe" according to governance policies but still low-quality, off-topic, incomplete, or unhelpful for the user's actual needs.

How it works: You define evaluation criteria that describe what "good output" looks like for specific agent steps. After a step produces output, an LLM scores it against your criteria. Depending on the evaluation mode, failing outputs can trigger automatic retries or be flagged for human review.

Key characteristics:

Custom quality criteria per agent step
Three modes: manual scoring, eval-and-retry, sample-and-flag
Targets specific steps (typically terminal output steps)
Scores on a 0–1 scale with configurable pass thresholds
Optional auto-retry when criteria aren't met

→ Full documentation: Agent Evaluations

Comparison Table

Aspect	Prompt Scanning	Governance Policies	Agent Evaluations
Purpose	Block injection attacks	Enforce content compliance	Validate output quality
What it checks	Inputs and external-source step outputs	Inputs and outputs	Step outputs only
When it runs	Before any processing; after external-source steps	At configured screening points	After step execution
Evaluation method	ML classifier	LLM-based policy evaluation	LLM-based criteria scoring
Configuration	None (always on)	Policy definitions, thresholds, scoping	Criteria descriptions, thresholds, modes
Outcomes	Block or pass	Pass, flag, or block	Pass, fail, retry, or flag
Credit cost	Zero	Per evaluation	Per evaluation
Latency impact	Minimal (~200ms)	Depends on tier and blocking mode	Depends on mode (sync or sampled)
Scope	Platform-wide	Account → Agent → Step → Source	Per agent, per step

How They Work Together

In a typical agent run, the three layers activate in sequence:

Input arrives → Prompt scanner checks for injection attacks. If unsafe, the run is blocked immediately.
Agent executes → Each step processes its input and produces output. Steps that fetch external data (Web Fetch, Web Search, Webhook Call) have their outputs automatically scanned for prompt injection before downstream steps can consume them.
Output is screened → Governance policies evaluate step outputs against your compliance rules. Violations are flagged or blocked.
Quality is assessed → Agent evaluations score the final output against your quality criteria. Failures may trigger retries.

This layered approach means:

A jailbreak attempt is caught at step 1 (prompt scanner) before any LLM call happens.
A prompt injection hidden in a fetched web page is caught at step 2 (output scanning) before it reaches downstream LLM steps.
A compliant but policy-violating output (e.g., containing PII) is caught at step 3 (governance).
A safe, compliant but low-quality answer is caught at step 4 (evaluations).

When to Use What

Prompt Scanning is always active — you don't need to do anything. It protects against adversarial inputs automatically.

Governance Policies are recommended when:

You need to enforce content rules (safety, PII, legal, brand guidelines)
You want to screen source content before it enters knowledge bases
You need an audit trail of policy evaluations
Different agents or sources need different policy sets

Agent Evaluations are recommended when:

You want to validate that outputs meet domain-specific quality standards
You want automatic retries when output quality is poor
You want to monitor production quality via sampling
Your quality criteria go beyond "safe/unsafe" into "helpful/accurate/complete"

Most production deployments use all three layers together.