Task scheduler focus on steadily completing WF executions

Registered by Winson Chan

Currently, the scheduling of action execution is basically done via the RPC and the message queue's FIFO to gate the action executions. Let's say I have a moderately complex WF with 10 steps and takes 10 seconds to complete usually. If I have 100 of these launched at the same time, it'll take ALL 100 of these executions 15-20 minutes to complete. Let's say at any given time, Mistral can only execute 100 tasks at a time. This means, the 100 WF executions are progressing 1 step at a time together. Imagine if there's 1000 WF executions currently running in the system. It'll take a very long time to complete. The state of the WF is RUNNING and users would be wondering why it's taking so long or worst think that the WF execution is hanging. IMO, Mistral should prioritize the completion of WF executions and should first try to execute tasks from an older WF execution. This requires a combination of throttling of WF executions and a different priority based task scheduler. Throttling is only 1 part of the solution. I think Mistral needs a different task scheduler and one that gives older WF executions priority. Also Mistral needs to clearly show whether a task is being delayed/postponed/rescheduled. The reason why I'm pushing for this is that if it's just throttling, we are still going to run the problem I highlighted above. A 10 seconds WF will take 15-20 minutes to finish because ALL the executions are progressing stepwise at the same rate. And all the users see is that these executions are RUNNING. There's no indication that the system is at capacity. A user expects a WF to complete in 10 seconds and it has been running for 5 minutes, the user's perception is that the WF is stuck/hanging. If we give older WF executions priority and we accurately describe the state of the executions, then the perception is that the system is working and WF are being completed although takes longer. The same amount of work will get done, but the progress is clearer and WFs are steadily completed instead of a big bang completion.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Medium
Drafter:
Winson Chan
Direction:
Needs approval
Assignee:
Winson Chan
Definition:
New
Series goal:
None
Implementation:
Not started
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.