Safety & Quality Overview
Seclai provides three distinct layers that protect your AI pipelines from different categories of risk. Each layer operates at a different point in the pipeline, uses a different evaluation technique, and catches a different type of problem.
Three Layers of Protection
- Prompt Scanning — Blocks prompt injection and jailbreaking attacks before content reaches any LLM.
- Governance Policies — Screens content against your safety, privacy, and compliance rules, flagging or blocking violations.
- Agent Evaluations — Validates that agent outputs meet your quality criteria, with optional auto-retry.
These layers are complementary. Prompt scanning prevents malicious inputs from entering the system. Governance ensures content complies with your policies. Agent evaluations confirm the output is useful and correct.
- Prompt Scanning
Problem it solves: Adversarial users or compromised data sources attempt to hijack your agents via prompt injection or jailbreaking techniques.
How it works: A dedicated ML classifier (not an LLM) analyzes all incoming text at every platform ingress point, and also scans the outputs of steps that fetch data from external sources (Web Fetch, Web Search, Webhook Call). Malicious content is blocked instantly — before it is forwarded to any LLM, indexed for retrieval, or returned to users.
Key characteristics:
- Always on, no configuration required
- Sub-second latency, zero LLM cost
- Runs before any other processing
- Binary outcome: safe or blocked
→ Full documentation: Prompt Scanner
- Governance Policies
Problem it solves: Agent outputs or ingested content may contain harmful, sensitive, or non-compliant material that violates your organization's policies — even when the input was perfectly benign.
How it works: You define natural-language policies describing what content is acceptable. An LLM evaluates content against these policies at configured screening points (agent input, step output, source content). Violations are flagged for review or blocked outright depending on your settings.
Key characteristics:
- Configurable policies with natural-language descriptions
- LLM-based evaluation (uses credits)
- Screens both inputs and outputs
- Three verdicts: pass, flag, or block
- Configurable blocking mode (sync gate vs async audit)
- Scoping hierarchy: account → agent → step → source
→ Full documentation: Governance
- Agent Evaluations
Problem it solves: An agent's output may be technically "safe" according to governance policies but still low-quality, off-topic, incomplete, or unhelpful for the user's actual needs.
How it works: You define evaluation criteria that describe what "good output" looks like for specific agent steps. After a step produces output, an LLM scores it against your criteria. Depending on the evaluation mode, failing outputs can trigger automatic retries or be flagged for human review.
Key characteristics:
- Custom quality criteria per agent step
- Three modes: manual scoring, eval-and-retry, sample-and-flag
- Targets specific steps (typically terminal output steps)
- Scores on a 0–1 scale with configurable pass thresholds
- Optional auto-retry when criteria aren't met
→ Full documentation: Agent Evaluations
Comparison Table
| Aspect | Prompt Scanning | Governance Policies | Agent Evaluations |
|---|---|---|---|
| Purpose | Block injection attacks | Enforce content compliance | Validate output quality |
| What it checks | Inputs and external-source step outputs | Inputs and outputs | Step outputs only |
| When it runs | Before any processing; after external-source steps | At configured screening points | After step execution |
| Evaluation method | ML classifier | LLM-based policy evaluation | LLM-based criteria scoring |
| Configuration | None (always on) | Policy definitions, thresholds, scoping | Criteria descriptions, thresholds, modes |
| Outcomes | Block or pass | Pass, flag, or block | Pass, fail, retry, or flag |
| Credit cost | Zero | Per evaluation | Per evaluation |
| Latency impact | Minimal (~200ms) | Depends on tier and blocking mode | Depends on mode (sync or sampled) |
| Scope | Platform-wide | Account → Agent → Step → Source | Per agent, per step |
How They Work Together
In a typical agent run, the three layers activate in sequence:
- Input arrives → Prompt scanner checks for injection attacks. If unsafe, the run is blocked immediately.
- Agent executes → Each step processes its input and produces output. Steps that fetch external data (Web Fetch, Web Search, Webhook Call) have their outputs automatically scanned for prompt injection before downstream steps can consume them.
- Output is screened → Governance policies evaluate step outputs against your compliance rules. Violations are flagged or blocked.
- Quality is assessed → Agent evaluations score the final output against your quality criteria. Failures may trigger retries.
This layered approach means:
- A jailbreak attempt is caught at step 1 (prompt scanner) before any LLM call happens.
- A prompt injection hidden in a fetched web page is caught at step 2 (output scanning) before it reaches downstream LLM steps.
- A compliant but policy-violating output (e.g., containing PII) is caught at step 3 (governance).
- A safe, compliant but low-quality answer is caught at step 4 (evaluations).
When to Use What
Prompt Scanning is always active — you don't need to do anything. It protects against adversarial inputs automatically.
Governance Policies are recommended when:
- You need to enforce content rules (safety, PII, legal, brand guidelines)
- You want to screen source content before it enters knowledge bases
- You need an audit trail of policy evaluations
- Different agents or sources need different policy sets
Agent Evaluations are recommended when:
- You want to validate that outputs meet domain-specific quality standards
- You want automatic retries when output quality is poor
- You want to monitor production quality via sampling
- Your quality criteria go beyond "safe/unsafe" into "helpful/accurate/complete"
Most production deployments use all three layers together.