Alarming refactoring

Registered by Swann Croiset

In order to enable several uses cases we must rework alarming framework

Blueprint information

Status:
Complete
Approver:
None
Priority:
Essential
Drafter:
Swann Croiset
Direction:
Approved
Assignee:
Swann Croiset
Definition:
Review
Series goal:
Accepted for 1.0
Implementation:
Implemented
Milestone target:
milestone icon 1.0.0
Started by
Swann Croiset
Completed by
Swann Croiset

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/alarming-refactoring,n,z

Addressed by: https://review.openstack.org/368928
    Alarm definition refactoring

Addressed by: https://review.openstack.org/369583
    Do not send AFD to Nagios when activate_alerting=false

Addressed by: https://review.openstack.org/369584
    Do not send GSE to Nagios when activate_alerting=false

Addressed by: https://review.openstack.org/369585
    Removed old hiera data

Addressed by: https://review.openstack.org/369587
    Alarming refactoring

Addressed by: https://review.openstack.org/369588
    Support activate_alarming and enable_notification

Addressed by: https://review.openstack.org/370276
    Support activate_alerting and enable_notification properties for GSE

Addressed by: https://review.openstack.org/370277
    Add default AFD for unknown fuel roles

Addressed by: https://review.openstack.org/371345
    Simply AFD alarm field structure

Addressed by: https://review.openstack.org/371346
    Send GSE service clusters status to alerting

Gerrit topic: https://review.openstack.org/#q,topic:alarming-refactoring,n,z

Addressed by: https://review.openstack.org/371433
    Avoid alarm flapping for Ceph OSD checks

Addressed by: https://review.openstack.org/371398
    Fix horizon alarm

Addressed by: https://review.openstack.org/371399
    Add alarm for Horizon HTTP 5xx errors

Addressed by: https://review.openstack.org/372515
    Monitor all partitions

Addressed by: https://review.openstack.org/372516
    Configure alarms for OSD disk(s)

Addressed by: https://review.openstack.org/372517
    Include Ceph OSD node to the storage cluster

Addressed by: https://review.openstack.org/372518
    Fix the InfluxDB VIP check to map the GSE configuration

Addressed by: https://review.openstack.org/373816
    Split top-level clusters health by (control|data)-plane

Addressed by: https://review.openstack.org/376433
    Make Pacemaker global status dependent of controller cluster status

Addressed by: https://review.openstack.org/376434
    Decouple aggregator election from Pacemaker resource

Addressed by: https://review.openstack.org/376563
    [WIP] remove no_data_policy=skip

Addressed by: https://review.openstack.org/384882
    Support alerting attribute per AFD

Addressed by: https://review.openstack.org/384883
    Support alerting attribute per AFD

Addressed by: https://review.openstack.org/384884
    Enable notifications for HDD errors

Addressed by: https://review.openstack.org/384885
    Revert "Remove the no_data_policy=skip for AFD"

Addressed by: https://review.openstack.org/385046
    Add no_data_policy=skip for all workers alarms

Addressed by: https://review.openstack.org/384552
    Fix rabbitmq-pacemaker related alarms

Addressed by: https://review.openstack.org/385820
    Do not send cluster AFDs to Nagios

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.