restart failed processes in parallel flow

Registered by suzhengwei

In masakari process-monitor, once a process defined in config file is failed,
the monitor will try to restart the process a few times. It will trigger
process failed notification if failed to restart the process at last.

Usually, we use process-monitor to detect status for several processes.
There are two potential defects if trying to restart the failed process in
linear flow.

Firstly, if the monitor detect two processes failed in the same poll period.
For example, process A and process B, while process A depend on process B.
It maybe firstly trys to restart process A, then process B. The two process
fail to restart.

Secondly, the process is allowed to restart 3 times, and 10 seconds interval
per time. It means it would take 30 seconds to restart one failed process.
If there are N processes failed in the same time, it would take 30*N seconds
to restart failed processes for the worst situation. The recovery reaction
may be very slow.

The process-monitor needs promotion. Try to restart failed processes
in parallel flow.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
suzhengwei
Direction:
Needs approval
Assignee:
suzhengwei
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

Addressed by: https://review.opendev.org/710191
    restart failed processes in parallel flow

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.