3b75075426
Add optional TLS certificate expiry checks for website monitors and update product, package, environment, Docker, and documentation naming.
30 lines
844 B
Markdown
30 lines
844 B
Markdown
# Alerting Design
|
|
|
|
Alerting is built around alert rules, incidents, notification policies, and notification history.
|
|
|
|
## Alert Rules
|
|
|
|
An alert rule turns monitor status or metric data into an incident. Initial rule behavior should support:
|
|
|
|
- Failure thresholds
|
|
- Recovery notifications
|
|
- Cooldown
|
|
- Severity
|
|
- Acknowledge
|
|
- Silence
|
|
|
|
## Incidents
|
|
|
|
Incidents represent active or historical alert events. They include opened time, resolved time, current status, severity, related asset, related monitor, related alert rule, notification history, acknowledgement, and silence state.
|
|
|
|
## Notifications
|
|
|
|
Initial channels:
|
|
|
|
- Email / SMTP
|
|
- Mattermost incoming webhook
|
|
- Zoom Team Chat incoming webhook
|
|
- Generic webhook
|
|
|
|
Alert messages should be human-readable and include asset, check, status, duration, timestamps, and a link back to OrbitalWard.
|