Resolve combination alarm dependency chain

Registered by ZhiQiang Fan

combination alarm introduces dependency between alarms, but for now, we don't handle it properly, the main problems are

1) dependency chain is too long, which causes the state of combination alarm will not be evaluated timely, in the worst case, it will cause the combination alarms never be updated
2) dead loop will be introduced when update alarm, which causes the state of combination alarm will not be set properly.
3) and dependency chain will be broken when delete

for the 1) problem:

for i.e. combation alarm ca01 specifies threshold alarms ta02 and ta03 to its alarm_ids, and ca04 specifies ca01 and ta05 to its alarm_ids, currently, they are updated in random order, so ca04 will be update in 3 periods in the worst case if all alarms are stable. which means, ta02 and ta03 in first period, ca01 and ta05 in second, and finally ca04 in third period.

if the required alarms are not stable, ca04 may not be updated forever in the worst case, which means, ta02 and ta03 is alarmed in first period, but they are ok in second period, and ca01 get the second period state of ta02 and ta03, then it will be ok, the alarm state will not pass to ca04

the root cause is that, combination rule specifies a dependency chain but evaluator just ignore that relationship

I think we should do something to let things works as they should, the basic idea is:

1. give a imaged attribute **depth** to all alarms, threshold alarms have 0 depth, combination alarms which only require threshold have 1 depth, other alarms ha
ve depth = max{required alarms depth} + 1
2. evaluator alarms by the depth in ascend order

this problem can be addressed by:
https://bugs.launchpad.net/ceilometer/+bug/1310500

for the 2) problem:

for i.e:
assume we have threhold alarm C, D
then we create combination alarm A with alarm_ids = C, D
and we create combination alarm B with alarm_ids = A
then we update alarm A with alarm_ids = B
then when evaluator alarm state, the two combination alarm is a dead loop

this problem can be addressed by:
https://bugs.launchpad.net/ceilometer/+bug/1309182
https://bugs.launchpad.net/ceilometer/+bug/1304419

for the 3) problem:

for i.e (a -> b means a depends on b):
alarm ca01 -> ta01, ca02 -> ca01, ca03 -> ca02
then we delete any one of (ta01, ca01, ca02), then ca03 will be dead, we should disable this behavior, especially an admin can specify alarms dependency chain between different project, or we should at least warn users about this dangerous operation.

this problem will be addressed by:
https://bugs.launchpad.net/ceilometer/+bug/1306407

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
ZhiQiang Fan
Direction:
Needs approval
Assignee:
ZhiQiang Fan
Definition:
Superseded
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
gordon chung

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/resolve-combination-alarm-dependency-chain,n,z

Addressed by: https://review.openstack.org/88431
    Avoid dead loop when update combination alarm

Addressed by: https://review.openstack.org/89330
    Evaluate alarms by their dependency order

let's address this with blueprint composite-threshold-rule-alarm -- gordc (9.11.15)

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.