Templated alarm descriptions for human readable alerts
As an SRE I want alarm notifications to have human-readable descriptions and a playbook (link), so that I can resolve the alarm root-cause easily.
Proposed Design
The alarm description field already provides enough space to accomodate actionable instructions to the recipients of the alarm. If we could refer to attributes of the actual alarms, then the description could be used to describe the root-cause of the alarm and provide links to additional information like dashboards, playbooks, ...
The proposed solution consists of the following parts:
API
* add the alarm-description field to alarm objects in addition to alarm-definition-id and alarm-definitio
* support Jinja2 syntax in alarm descriptions in order to support dynamic contents
* Make alarm attributes available for use in the description templates (most notably expose the dimensions of the alarm)
* support simplified MarkDown syntax in descriptions to permit hyperlinks and basic formatting
Notification
* support configuring Jinja2 templates for notifications
* make notification and alarm attributes available for use in the notification templates (most notable alarm-name, rendered description, state-change date, alarm-state, severity)
* support simplified MarkDown syntax in notification templates to permit hyperlinks and basic formatting
Example:
Here is an example how Slack alerting would work:
1. You have an alarm—description in Markdown syntax with Jinja2 template variables.
The consumer offsets {{consumer_group}} for {{topic}} are ahead of the actual queue contents.
2. You have a channel template for Slack
slack:
timeout: 60
ca_certs: "/etc/ssl/
mime_type: application/json
template:
text: |
{
"username": "Monasca (de)",
"icon_url": "…",
"mrkdwn": true,
"title": "{{ {'ALARM': '*Alarm triggered*', 'OK': 'Alarm cleared', 'UNDETERMINED'
"text": "{% if state == 'ALARM' %}:bomb:
}
3. You receive alarms like this:
*Monasca (de)*
*Alarm triggered*
The consumer offsets *monasca-persister* for *metrics* are ahead of the actual queue contents.
Blueprint information
- Status:
- Not started
- Approver:
- Roland Hochmuth
- Priority:
- High
- Drafter:
- jobrs
- Direction:
- Needs approval
- Assignee:
- jobrs
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
A first implementation for this is available for Slack here: https:/
Unfortunately it is not part of a branch, so upstreaming is an extra step where the actual diff will be extracted.
Gerrit topic: https:/
Addressed by: https:/
Support templated alarm descriptions and notification templates
Addressed by: https:/
Support templated alarm descriptions and notification templates