restart failed processes in parallel flow
In masakari process-monitor, once a process defined in config file is failed,
the monitor will try to restart the process a few times. It will trigger
process failed notification if failed to restart the process at last.
Usually, we use process-monitor to detect status for several processes.
There are two potential defects if trying to restart the failed process in
linear flow.
Firstly, if the monitor detect two processes failed in the same poll period.
For example, process A and process B, while process A depend on process B.
It maybe firstly trys to restart process A, then process B. The two process
fail to restart.
Secondly, the process is allowed to restart 3 times, and 10 seconds interval
per time. It means it would take 30 seconds to restart one failed process.
If there are N processes failed in the same time, it would take 30*N seconds
to restart failed processes for the worst situation. The recovery reaction
may be very slow.
The process-monitor needs promotion. Try to restart failed processes
in parallel flow.
Blueprint information
- Status:
- Not started
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- suzhengwei
- Direction:
- Needs approval
- Assignee:
- suzhengwei
- Definition:
- New
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
Related branches
Related bugs
Sprints
Whiteboard
Addressed by: https:/
restart failed processes in parallel flow