retry check when host failure
If the platform has a bad network stability, and it would result in erroneous judgement, which is not expected.
The following log records repeatedly, and we can't endure frequent host recovery.
Oct 22 20:09:36 compute02 corosync[83721]: [TOTEM] A new membership [172.28.69.15:88] was formed, Members left:9
Oct 22 20:09:36 compute02 corosync[83721]: Failed to received the leave message. failed:9
Oct 22 20:09:36 compute02 cib[83738]: notice: Node compute07 state is now lost
...
Oct 22 20:09:47 compute02 corosync[83721]: [TOTEM] A new membership [172.28.69.15:98] was formed, Members joined:9
Oct 22 20:09:47 compute02 pacemakerd[83737]: error: Node compute07[9] appears to be online even though we think it is dead
Oct 22 20:09:47 compute02 pacemakerd[83737]: notice: Node compute07 state is now member
...
Retry checks is more reliable than once check.
Blueprint information
- Status:
- Complete
- Approver:
- Radosław Piliszek
- Priority:
- Undefined
- Drafter:
- suzhengwei
- Direction:
- Needs approval
- Assignee:
- suzhengwei
- Definition:
- Approved
- Series goal:
- None
- Implementation:
-
Implemented
- Milestone target:
- None
- Started by
- suzhengwei
- Completed by
- Radosław Piliszek
Related branches
Related bugs
Sprints
Whiteboard
Gerrit topic: https:/
Addressed by: https:/
spec for continuous check to determine host status
Addressed by: https:/
continuous check to determine host status