Signal watches your entire fleet and tells you the second something goes wrong. Disk filling up, service down, SSL expiring, agent stuck — get alerted through the channel you choose, not the one a vendor forces on you.
Configure Alerts →Critical, warning, info, resolved. Each tier routes differently. Critical pages you at 3am. Info waits for your morning dashboard.
Lucidia reads every alert before you do. Correlates events across nodes. "Cecilia disk full" + "MinIO write errors" = one root cause, one alert.
Push to RoundTrip, email, webhook, NATS, or SMS. Stack multiple channels per severity. Never miss a critical alert.
If nobody acknowledges a critical alert in 5 minutes, escalate. Retry on a different channel. Keep escalating until someone responds.
Every alert logged with timestamp, severity, source, and resolution. Full audit trail. See patterns before they become outages.
Maintenance windows, one-click snooze, auto-silence for known issues. No alert fatigue. Only real problems reach you.
SSH connection timeout after 30s. Last seen 2h ago. Ollama, MinIO, PostgreSQL affected.
Read errors increasing. 847 bad sectors detected. Backup completed, replacement recommended.
Let's Encrypt auto-renewal triggered for 12 domains on Gematria. Caddy handling rotation.
Disk usage dropped from 89% to 64% after log rotation. Threshold cleared.
Node unresponsive to WireGuard ping. Physical access required for recovery.
Write rules in plain English or code. Signal compiles them into efficient watchers.
disk > 85%Alert when any node disk usage exceeds 85%ping timeout 30sCritical alert if any fleet node stops respondingssl < 7dWarning when SSL certificates expire within 7 daysagent idle > 1hNotify when any agent has been idle for over an hourhttp 5xx > 10/minEscalate if error rate spikes above 10 per minute