retry check when host failure

Registered by suzhengwei

If the platform has a bad network stability, and it would result in erroneous judgement, which is not expected.

The following log records repeatedly, and we can't endure frequent host recovery.
Oct 22 20:09:36 compute02 corosync[83721]: [TOTEM] A new membership [172.28.69.15:88] was formed, Members left:9
Oct 22 20:09:36 compute02 corosync[83721]: Failed to received the leave message. failed:9
Oct 22 20:09:36 compute02 cib[83738]: notice: Node compute07 state is now lost
...
Oct 22 20:09:47 compute02 corosync[83721]: [TOTEM] A new membership [172.28.69.15:98] was formed, Members joined:9
Oct 22 20:09:47 compute02 pacemakerd[83737]: error: Node compute07[9] appears to be online even though we think it is dead
Oct 22 20:09:47 compute02 pacemakerd[83737]: notice: Node compute07 state is now member
...

Retry checks is more reliable than once check.

Blueprint information

Status:
Complete
Approver:
Radosław Piliszek
Priority:
Undefined
Drafter:
suzhengwei
Direction:
Needs approval
Assignee:
suzhengwei
Definition:
Approved
Series goal:
None
Implementation:
Implemented
Milestone target:
None
Started by
suzhengwei
Completed by
Radosław Piliszek

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.opendev.org/#/q/topic:bp/retry-check-when-host-failure

Addressed by: https://review.opendev.org/761499
    spec for continuous check to determine host status

Addressed by: https://review.opendev.org/761704
   continuous check to determine host status

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.