Alerts
Alerts notify you when agent runs or content source pulls match specific conditions — failures, performance regressions, error rate spikes, or unusual activity. Configure them once, and Seclai monitors your resources 24/7 so you don't have to.
Overview
The Alerts system has two sides:
- Alert Configuration — Rules you define that specify when to trigger a notification (e.g., "alert me after 3 consecutive failures")
- Alert Instances — Individual triggered alerts created when a rule fires (e.g., "Agent 'Daily Report' failed 3 times in a row at 2:14 PM")
You manage both from the Alerts section in the left sidebar.
Alert Scopes
Alerts can be configured at three levels:
| Scope | Where to Configure | Applies To |
|---|---|---|
| Account-wide | Alerts → Agent Alerts / Content Source Alerts tabs | All agents or all sources in the account |
| Per-agent | Agent → Alerts tab | A specific agent only |
| Per-source | Content Source → Alerts tab | A specific content source only |
Per-resource alerts override account-wide settings for that resource, giving you fine-grained control.
Example: You might set a conservative account-wide "consecutive failures" threshold of 5, then override it to 2 for a business-critical agent that needs faster notification.
Agent Alert Types
Five alert types are available for monitoring agent runs:
Run Failed
Triggers an alert every time an agent run fails, regardless of context. This is the simplest alert type — useful for agents where any failure is unacceptable.
| Setting | Value |
|---|---|
| Thresholds | None — every failure triggers |
| Best for | Critical, low-volume agents |
Example use case: A financial reporting agent that runs once daily. Every failure must be investigated immediately.
Consecutive Failures
Triggers an alert when an agent fails a configurable number of times in a row. This filters out one-off transient errors and only alerts on persistent issues.
| Setting | Range | Default | Description |
|---|---|---|---|
| Count | 2 – 100 | 3 | Number of consecutive failures before alerting |
Example use case: A web scraping agent that occasionally encounters timeout errors. Setting the count to 3 means you're only alerted when the upstream site is genuinely down, not when a single request times out.
Example configuration:
Alert type: Consecutive Failures
Count: 3
Cooldown: 60 minutes
→ Alert fires after the 3rd failure in a row
→ Won't fire again for at least 60 minutes
Error Rate Spike
Triggers when the failure rate within a sliding window of runs exceeds a threshold. Ideal for high-volume agents where some failures are tolerable but a spike indicates a systemic problem.
| Setting | Range | Default | Description |
|---|---|---|---|
| Rate | 0.01 – 1.0 | 0.5 | Failure rate threshold (e.g., 0.5 = 50%) |
| Window runs | 5 – 1,000 | 20 | Number of recent runs to evaluate |
Example use case: A customer support agent that handles hundreds of conversations daily. A 5% failure rate is normal, but if it spikes to 50% across the last 20 runs, something is wrong.
Example configuration:
Alert type: Error Rate Spike
Rate: 0.25 (25%)
Window runs: 50
Cooldown: 120 minutes
→ Alert fires when 13+ of the last 50 runs failed (≥25%)
Run Burst
Triggers when too many runs start within a short time window. This detects unusual activity — accidental loops, abuse, or misconfigured triggers.
| Setting | Range | Default | Description |
|---|---|---|---|
| Max runs | 2 – 10,000 | 50 | Maximum allowed runs in the time window |
| Window minutes | 1 – 1,440 | 10 | Length of the evaluation window (in minutes) |
Example use case: An agent triggered by content updates. If a bulk import accidentally adds 500 items at once, the run burst alert fires so you can investigate before consuming thousands of credits.
Example configuration:
Alert type: Run Burst
Max runs: 100
Window minutes: 15
Cooldown: 30 minutes
→ Alert fires when 100+ runs start within any 15-minute window
Slow Run
Triggers when a run takes significantly longer than the agent's historical p95 (95th percentile) duration. This catches performance regressions that might indicate upstream issues, model slowdowns, or inefficient step configurations.
| Setting | Range | Default | Description |
|---|---|---|---|
| P95 multiplier | 1 – 10 | 2 | How many times the p95 duration a run must exceed to trigger |
| Min duration (seconds) | 0 – 86,400 | 30 | Minimum run duration to even consider (filters out fast runs) |
| Min historical runs | 1 – 1,000 | 10 | Minimum number of past runs needed to calculate a reliable p95 |
Example use case: A retrieval agent that normally completes in 5 seconds. If a model provider experiences latency issues and runs start taking 30+ seconds, the slow run alert fires.
Example configuration:
Alert type: Slow Run
P95 multiplier: 2.5
Min duration: 10 seconds
Min historical runs: 20
Cooldown: 60 minutes
→ Alert fires when a run exceeds 2.5× the p95 duration
→ Only if the run took at least 10 seconds
→ Only once at least 20 historical runs exist for comparison
Source Alert Types
Three alert types are available for monitoring content source pulls. The terminology adapts based on the source type:
- Websites and RSS feeds use the term "pull"
- File upload sources use the term "upload"
Pull Failed
Triggers every time a source pull (or upload) fails.
| Setting | Value |
|---|---|
| Thresholds | None — every failure triggers |
| Best for | Critical sources where freshness matters |
Consecutive Pull Failures
Triggers after a configurable number of consecutive pull failures.
| Setting | Range | Default | Description |
|---|---|---|---|
| Count | 2 – 100 | 3 | Consecutive failures before alerting |
Example use case: An RSS feed that occasionally returns 503 during maintenance windows. Set count to 3 so you're only alerted when the feed is persistently unreachable.
Pull Error Rate Spike
Triggers when the pull failure rate in a window exceeds a threshold.
| Setting | Range | Default | Description |
|---|---|---|---|
| Rate | 0.01 – 1.0 | 0.5 | Failure rate threshold |
| Window pulls | 3 – 100 | 10 | Number of recent pulls to evaluate |
Configuring Alerts
Creating an Alert Configuration
- Navigate to Alerts in the left sidebar (for account-wide), or open an Agent / Content Source and click the Alerts tab
- Find the alert type you want to enable
- Toggle the switch to activate it
- Configure the threshold settings for your use case
- Set the cooldown period
- Choose the notification recipients
- Click Save Settings
Threshold Settings
Every alert type has threshold parameters that control when it fires. These vary by type — see the detailed tables in each alert type section above. All thresholds have validated ranges to prevent misconfiguration.
Cooldown Period
The cooldown controls how long Seclai waits after firing an alert before it can fire the same alert again. This prevents alert fatigue during extended outages.
| Setting | Range | Default |
|---|---|---|
| Cooldown | 1 – 1,440 minutes | 60 minutes |
Example: With a 60-minute cooldown, if a "Run Failed" alert fires at 2:00 PM, the next alert of the same type won't fire until after 3:00 PM, even if additional failures occur in between.
Disabling an Alert
Toggle the switch off to disable an alert configuration without deleting it. Your settings (thresholds, cooldown, recipients) are preserved and can be re-enabled at any time.
Removing an Alert
Click Remove (red button) to permanently delete an alert configuration. This cannot be undone.
Alert Lifecycle
When an alert configuration fires, it creates an alert instance that progresses through a lifecycle:
Triggered → Acknowledged → Resolved
↘ Dismissed
| Status | Meaning | Color |
|---|---|---|
| Triggered | Alert has fired and needs attention | Red |
| Acknowledged | Someone is investigating the issue | Yellow |
| Resolved | The issue has been fixed | Green |
| Dismissed | The alert was a false positive or not actionable | Gray |
Status Transitions
| From | To | When to Use |
|---|---|---|
| Triggered | Acknowledged | You've seen the alert and are investigating |
| Triggered | Resolved | The issue was already fixed or resolved itself |
| Triggered | Dismissed | False positive or not worth investigating |
| Acknowledged | Resolved | Investigation complete, issue fixed |
| Acknowledged | Dismissed | Investigation revealed a false positive |
Each status change can include an optional note explaining the transition.
Filtering Alerts
The alerts list supports filtering by:
- Status — All, Triggered, Acknowledged, Resolved, or Dismissed
- Time frame — Same time frame selector used across the Dashboard
The table displays:
| Column | Description |
|---|---|
| Status | Color-coded badge |
| Type | Alert type (e.g., "Consecutive Failures") |
| Title | Human-readable summary |
| Triggered | Date and time the alert fired |
| Comments | Number of comments on the alert |
| Subscribers | Number of users subscribed to updates |
Click any row to open the alert detail page.
Alert Detail Page
Each alert instance has a dedicated detail page with full context and collaboration features.
Metadata
- Alert type — Which alert rule triggered
- Triggered date — When the alert was created
- Updated date — Last time the alert was modified
- Current status — Color-coded badge
Structured Details
The detail view shows context specific to each alert type:
| Alert Type | Details Shown |
|---|---|
| Run Failed | Failed step ID, step type, error message |
| Consecutive Failures | Failed step info, consecutive failure count |
| Error Rate Spike | Current error rate, failed/total runs, configured threshold |
| Run Burst | Run count, window duration, configured threshold |
| Slow Run | Run duration, p95 duration, multiplier, slow threshold, historical run count |
Status History
A timeline tracks every status change, showing:
- Who made the change (user name)
- When it happened (timestamp)
- The status transition
- Any note attached to the change
Comments
Add comments to discuss the alert with your team:
- Type your message in the text area
- Click Comment
Comments show the author's name, timestamp, and message body. Use them to document investigation findings, root cause analysis, or remediation steps.
Subscriptions
Subscribe to an alert instance to receive updates when its status changes or new comments are added.
- Subscribe — Click the Subscribe button to follow the alert
- Unsubscribe — Click Unsubscribe to stop receiving updates
- The subscriber count is visible in the alert list and detail page
Notifications
When configuring an alert, you choose who receives email notifications when it fires.
Personal Accounts
Notifications are sent to the email associated with your account. No additional configuration needed.
Organization Accounts
Three distribution options:
| Option | Who Receives Notifications |
|---|---|
| Account owner only | Only the account owner |
| Owner & administrators | Owner plus all users with administrator role |
| Selected members | Specific organization members you choose |
When "Selected members" is chosen, a searchable multi-select dropdown appears where you pick individual team members.
Example: For a critical production agent, select "Owner & administrators" so the on-call team is always notified. For a lower-priority staging agent, select "Account owner only."
Permissions
| Role | View Alerts | Configure Alerts | Change Status / Comment |
|---|---|---|---|
| Owner | ✅ | ✅ | ✅ |
| Admin | ✅ | ✅ | ✅ |
| Editor | ✅ | ✅ | ✅ |
| Viewer | ❌ Redirected | ❌ 403 Forbidden | ❌ 403 Forbidden |
Examples
Example: Critical Agent Monitoring
For a daily financial reporting agent that must never fail silently:
Account-wide alerts (baseline):
• Consecutive Failures: count=5, cooldown=120 min
• Error Rate Spike: rate=0.3, window=50, cooldown=60 min
Per-agent overrides (Financial Reporter):
• Run Failed: enabled, cooldown=10 min
• Consecutive Failures: count=2, cooldown=15 min
• Slow Run: multiplier=1.5, min_duration=60s, cooldown=30 min
Notification: Owner & administrators
Example: High-Volume Agent with Burst Detection
For a customer support bot handling thousands of runs per day:
Per-agent configuration:
• Error Rate Spike: rate=0.10 (10%), window=100, cooldown=60 min
• Run Burst: max_runs=500, window=5 min, cooldown=30 min
• Slow Run: multiplier=3, min_duration=15s, min_runs=50, cooldown=120 min
Notification: Selected members → [ops-team@company.com, lead@company.com]
Example: Content Source Freshness Monitoring
For RSS feeds that must stay current:
Account-wide source alerts:
• Consecutive Pull Failures: count=3, cooldown=60 min
• Pull Error Rate Spike: rate=0.5, window=10, cooldown=120 min
Per-source overrides (Primary News Feed):
• Pull Failed: enabled, cooldown=15 min
• Consecutive Pull Failures: count=2, cooldown=30 min
Notification: Account owner only
Next Steps
- Dashboard — Monitor aggregated metrics for agents, sources, and credits
- Agents — Learn about creating and configuring agents
- Content Sources — Set up sources for your knowledge bases
- Organizations — Manage team members and notification recipients