Alerts

Alerts notify you when agent runs or content source pulls match specific conditions — failures, performance regressions, error rate spikes, or unusual activity. Configure them once, and Seclai monitors your resources 24/7 so you don't have to.

Overview

The Alerts system has two sides:

Alert Configuration — Rules you define that specify when to trigger a notification (e.g., "alert me after 3 consecutive failures")
Alert Instances — Individual triggered alerts created when a rule fires (e.g., "Agent 'Daily Report' failed 3 times in a row at 2:14 PM")

You manage both from the Alerts section in the left sidebar.

Alert Scopes

Alerts can be configured at three levels:

Scope	Where to Configure	Applies To
Account-wide	Alerts → Agent Alerts / Content Source Alerts tabs	All agents or all sources in the account
Per-agent	Agent → Alerts tab	A specific agent only
Per-source	Content Source → Alerts tab	A specific content source only

Per-resource alerts override account-wide settings for that resource, giving you fine-grained control.

Example: You might set a conservative account-wide "consecutive failures" threshold of 5, then override it to 2 for a business-critical agent that needs faster notification.

Agent Alert Types

Five alert types are available for monitoring agent runs:

Run Failed

Triggers an alert every time an agent run fails, regardless of context. This is the simplest alert type — useful for agents where any failure is unacceptable.

Setting	Value
Thresholds	None — every failure triggers
Best for	Critical, low-volume agents

Example use case: A financial reporting agent that runs once daily. Every failure must be investigated immediately.

Consecutive Failures

Triggers an alert when an agent fails a configurable number of times in a row. This filters out one-off transient errors and only alerts on persistent issues.

Setting	Range	Default	Description
Count	2 – 100	3	Number of consecutive failures before alerting

Example use case: A web scraping agent that occasionally encounters timeout errors. Setting the count to 3 means you're only alerted when the upstream site is genuinely down, not when a single request times out.

Example configuration:

Alert type: Consecutive Failures
Count: 3
Cooldown: 60 minutes

→ Alert fires after the 3rd failure in a row
→ Won't fire again for at least 60 minutes

Error Rate Spike

Triggers when the failure rate within a sliding window of runs exceeds a threshold. Ideal for high-volume agents where some failures are tolerable but a spike indicates a systemic problem.

Setting	Range	Default	Description
Rate	0.01 – 1.0	0.5	Failure rate threshold (e.g., 0.5 = 50%)
Window runs	5 – 1,000	20	Number of recent runs to evaluate

Example use case: A customer support agent that handles hundreds of conversations daily. A 5% failure rate is normal, but if it spikes to 50% across the last 20 runs, something is wrong.

Example configuration:

Alert type: Error Rate Spike
Rate: 0.25 (25%)
Window runs: 50
Cooldown: 120 minutes

→ Alert fires when 13+ of the last 50 runs failed (≥25%)

Run Burst

Triggers when too many runs start within a short time window. This detects unusual activity — accidental loops, abuse, or misconfigured triggers.

Setting	Range	Default	Description
Max runs	2 – 10,000	50	Maximum allowed runs in the time window
Window minutes	1 – 1,440	10	Length of the evaluation window (in minutes)

Example use case: An agent triggered by content updates. If a bulk import accidentally adds 500 items at once, the run burst alert fires so you can investigate before consuming thousands of credits.

Example configuration:

Alert type: Run Burst
Max runs: 100
Window minutes: 15
Cooldown: 30 minutes

→ Alert fires when 100+ runs start within any 15-minute window

Slow Run

Triggers when a run takes significantly longer than the agent's historical p95 (95th percentile) duration. This catches performance regressions that might indicate upstream issues, model slowdowns, or inefficient step configurations.

Setting	Range	Default	Description
P95 multiplier	1 – 10	2	How many times the p95 duration a run must exceed to trigger
Min duration (seconds)	0 – 86,400	30	Minimum run duration to even consider (filters out fast runs)
Min historical runs	1 – 1,000	10	Minimum number of past runs needed to calculate a reliable p95

Example use case: A retrieval agent that normally completes in 5 seconds. If a model provider experiences latency issues and runs start taking 30+ seconds, the slow run alert fires.

Example configuration:

Alert type: Slow Run
P95 multiplier: 2.5
Min duration: 10 seconds
Min historical runs: 20
Cooldown: 60 minutes

→ Alert fires when a run exceeds 2.5× the p95 duration
→ Only if the run took at least 10 seconds
→ Only once at least 20 historical runs exist for comparison

Source Alert Types

Three alert types are available for monitoring content source pulls. The terminology adapts based on the source type:

Websites and RSS feeds use the term "pull"
File upload sources use the term "upload"

Pull Failed

Triggers every time a source pull (or upload) fails.

Setting	Value
Thresholds	None — every failure triggers
Best for	Critical sources where freshness matters

Consecutive Pull Failures

Triggers after a configurable number of consecutive pull failures.

Setting	Range	Default	Description
Count	2 – 100	3	Consecutive failures before alerting

Example use case: An RSS feed that occasionally returns 503 during maintenance windows. Set count to 3 so you're only alerted when the feed is persistently unreachable.

Pull Error Rate Spike

Triggers when the pull failure rate in a window exceeds a threshold.

Setting	Range	Default	Description
Rate	0.01 – 1.0	0.5	Failure rate threshold
Window pulls	3 – 100	10	Number of recent pulls to evaluate

Configuring Alerts

Creating an Alert Configuration

Navigate to Alerts in the left sidebar (for account-wide), or open an Agent / Content Source and click the Alerts tab
Find the alert type you want to enable
Toggle the switch to activate it
Configure the threshold settings for your use case
Set the cooldown period
Choose the notification recipients
Click Save Settings

Threshold Settings

Every alert type has threshold parameters that control when it fires. These vary by type — see the detailed tables in each alert type section above. All thresholds have validated ranges to prevent misconfiguration.

Cooldown Period

The cooldown controls how long Seclai waits after firing an alert before it can fire the same alert again. This prevents alert fatigue during extended outages.

Setting	Range	Default
Cooldown	1 – 1,440 minutes	60 minutes

Example: With a 60-minute cooldown, if a "Run Failed" alert fires at 2:00 PM, the next alert of the same type won't fire until after 3:00 PM, even if additional failures occur in between.

Disabling an Alert

Toggle the switch off to disable an alert configuration without deleting it. Your settings (thresholds, cooldown, recipients) are preserved and can be re-enabled at any time.

Removing an Alert

Click Remove (red button) to permanently delete an alert configuration. This cannot be undone.

Alert Lifecycle

When an alert configuration fires, it creates an alert instance that progresses through a lifecycle:

Triggered → Acknowledged → Resolved
                ↘ Dismissed

Status	Meaning	Color
Triggered	Alert has fired and needs attention	Red
Acknowledged	Someone is investigating the issue	Yellow
Resolved	The issue has been fixed	Green
Dismissed	The alert was a false positive or not actionable	Gray

Status Transitions

From	To	When to Use
Triggered	Acknowledged	You've seen the alert and are investigating
Triggered	Resolved	The issue was already fixed or resolved itself
Triggered	Dismissed	False positive or not worth investigating
Acknowledged	Resolved	Investigation complete, issue fixed
Acknowledged	Dismissed	Investigation revealed a false positive

Each status change can include an optional note explaining the transition.

Filtering Alerts

The alerts list supports filtering by:

Status — All, Triggered, Acknowledged, Resolved, or Dismissed
Time frame — Same time frame selector used across the Dashboard

The table displays:

Column	Description
Status	Color-coded badge
Type	Alert type (e.g., "Consecutive Failures")
Title	Human-readable summary
Triggered	Date and time the alert fired
Comments	Number of comments on the alert
Subscribers	Number of users subscribed to updates

Click any row to open the alert detail page.

Alert Detail Page

Each alert instance has a dedicated detail page with full context and collaboration features.

Metadata

Alert type — Which alert rule triggered
Triggered date — When the alert was created
Updated date — Last time the alert was modified
Current status — Color-coded badge

Structured Details

The detail view shows context specific to each alert type:

Alert Type	Details Shown
Run Failed	Failed step ID, step type, error message
Consecutive Failures	Failed step info, consecutive failure count
Error Rate Spike	Current error rate, failed/total runs, configured threshold
Run Burst	Run count, window duration, configured threshold
Slow Run	Run duration, p95 duration, multiplier, slow threshold, historical run count

Status History

A timeline tracks every status change, showing:

Who made the change (user name)
When it happened (timestamp)
The status transition
Any note attached to the change

Comments

Add comments to discuss the alert with your team:

Type your message in the text area
Click Comment

Comments show the author's name, timestamp, and message body. Use them to document investigation findings, root cause analysis, or remediation steps.

Subscriptions

Subscribe to an alert instance to receive updates when its status changes or new comments are added.

Subscribe — Click the Subscribe button to follow the alert
Unsubscribe — Click Unsubscribe to stop receiving updates
The subscriber count is visible in the alert list and detail page

Notifications

When configuring an alert, you choose who receives email notifications when it fires.

Personal Accounts

Notifications are sent to the email associated with your account. No additional configuration needed.

Organization Accounts

Three distribution options:

Option	Who Receives Notifications
Account owner only	Only the account owner
Owner & administrators	Owner plus all users with administrator role
Selected members	Specific organization members you choose

When "Selected members" is chosen, a searchable multi-select dropdown appears where you pick individual team members.

Example: For a critical production agent, select "Owner & administrators" so the on-call team is always notified. For a lower-priority staging agent, select "Account owner only."

Permissions

Role	View Alerts	Configure Alerts	Change Status / Comment
Owner	✅	✅	✅
Admin	✅	✅	✅
Editor	✅	✅	✅
Viewer	❌ Redirected	❌ 403 Forbidden	❌ 403 Forbidden

Examples

Example: Critical Agent Monitoring

For a daily financial reporting agent that must never fail silently:

Account-wide alerts (baseline):
  • Consecutive Failures: count=5, cooldown=120 min
  • Error Rate Spike: rate=0.3, window=50, cooldown=60 min

Per-agent overrides (Financial Reporter):
  • Run Failed: enabled, cooldown=10 min
  • Consecutive Failures: count=2, cooldown=15 min
  • Slow Run: multiplier=1.5, min_duration=60s, cooldown=30 min

Notification: Owner & administrators

Example: High-Volume Agent with Burst Detection

For a customer support bot handling thousands of runs per day:

Per-agent configuration:
  • Error Rate Spike: rate=0.10 (10%), window=100, cooldown=60 min
  • Run Burst: max_runs=500, window=5 min, cooldown=30 min
  • Slow Run: multiplier=3, min_duration=15s, min_runs=50, cooldown=120 min

Notification: Selected members → [ops-team@company.com, lead@company.com]

Example: Content Source Freshness Monitoring

For RSS feeds that must stay current:

Account-wide source alerts:
  • Consecutive Pull Failures: count=3, cooldown=60 min
  • Pull Error Rate Spike: rate=0.5, window=10, cooldown=120 min

Per-source overrides (Primary News Feed):
  • Pull Failed: enabled, cooldown=15 min
  • Consecutive Pull Failures: count=2, cooldown=30 min

Notification: Account owner only

Next Steps

Dashboard — Monitor aggregated metrics for agents, sources, and credits
Agents — Learn about creating and configuring agents
Content Sources — Set up sources for your knowledge bases
Organizations — Manage team members and notification recipients