Fleet Monitoring

Know before
it breaks.

Signal watches your entire fleet and tells you the second something goes wrong. Disk filling up, service down, SSL expiring, agent stuck — get alerted through the channel you choose, not the one a vendor forces on you.

Configure Alerts →

Capabilities

Alerting that actually works

🚨

Severity tiers

Critical, warning, info, resolved. Each tier routes differently. Critical pages you at 3am. Info waits for your morning dashboard.

🤖

AI triage

Lucidia reads every alert before you do. Correlates events across nodes. "Cecilia disk full" + "MinIO write errors" = one root cause, one alert.

🔌

Multi-channel delivery

Push to RoundTrip, email, webhook, NATS, or SMS. Stack multiple channels per severity. Never miss a critical alert.

⏱

Escalation rules

If nobody acknowledges a critical alert in 5 minutes, escalate. Retry on a different channel. Keep escalating until someone responds.

📋

Alert history

Every alert logged with timestamp, severity, source, and resolution. Full audit trail. See patterns before they become outages.

🔇

Smart silencing

Maintenance windows, one-click snooze, auto-silence for known issues. No alert fatigue. Only real problems reach you.

Live Feed

Recent alerts

Signal Feed — All Nodes

5 alerts today

Cecilia unreachable

SSH connection timeout after 30s. Last seen 2h ago. Ollama, MinIO, PostgreSQL affected.

14:23 UTC · CRITICAL · Unacknowledged

Lucidia SD card degrading

Read errors increasing. 847 bad sectors detected. Backup completed, replacement recommended.

12:05 UTC · WARNING · Acknowledged

SSL certificate renewing

Let's Encrypt auto-renewal triggered for 12 domains on Gematria. Caddy handling rotation.

09:41 UTC · INFO · Auto-resolved

Alice disk usage normal

Disk usage dropped from 89% to 64% after log rotation. Threshold cleared.

08:15 UTC · RESOLVED

Aria power cycle needed

Node unresponsive to WireGuard ping. Physical access required for recovery.

07:30 UTC · WARNING · Pending

Alert Rules

Define what matters

Write rules in plain English or code. Signal compiles them into efficient watchers.

disk > 85%Alert when any node disk usage exceeds 85%

ping timeout 30sCritical alert if any fleet node stops responding

ssl < 7dWarning when SSL certificates expire within 7 days

agent idle > 1hNotify when any agent has been idle for over an hour

http 5xx > 10/minEscalate if error rate spikes above 10 per minute

Delivery Channels

Route alerts where you need them

RoundTrip

Webhook

NATS pub/sub

SMS (Twilio)

ntfy.sh

Gotify

Matrix

Know beforeit breaks.