Partial revert in parallel flow

Registered by Anastasia Karpinska

For flow checkpoints described in https://wiki.openstack.org/wiki/TaskFlow/Checkpointing current implementation of multithreaded flow engine doesn't allow to retry a parallel branch.

Problem description
In the following example we have two parallel branches and a checkpoint that can retry a subflow. If Task B fails, the flow waits for the Task C and then reverts all executed tasks. But only the Task B should be reverted and it shouldn't wait for the Task C. When the Task C executes, other branch retries in parallel.

                --- Checkpoint -- Task B
Task A-|
                -------- Task C

Solution
We can implement a smart revert that will start from the failed node, walk back through the graph and revert only nodes that should be reverted. Some tasks can be reverted when other are executed.
In the given example Task C executes and Task B reverts simultaneously. If checkpoint fails, the flow will revert Task C.

But in this case flow appears in REVERTING and RUNNING states simultaneously. I propose not to set REVERTING state for the flow. It should be only a task state. Flow should be in running state until it finishes with SUCCESS or FAILURE.

Blueprint information

Status:
Complete
Approver:
Ivan Melnikov
Priority:
Medium
Drafter:
None
Direction:
Approved
Assignee:
Anastasia Karpinska
Definition:
Approved
Series goal:
Accepted for 0.2
Implementation:
Implemented
Milestone target:
None
Started by
Anastasia Karpinska
Completed by
Anastasia Karpinska

Related branches

Sprints

Whiteboard

Might be possible to merge this into:

https://blueprints.launchpad.net/taskflow/+spec/reversion-strategies ??

Or maybe keep it, since its a mix of checkpointing and reversion strategies.

(imelnikov): We can add dependencies from this bp to checkpointing and reversion strategies when we approve this. Or all three are the same thing in fact.

(imelnikov): I think intended semantics of flow states is not completely clear from bp description above. I think it should be smth like:
- REVERTING: flow has no such state any more;
- RUNNING: something is going on: tasks are executing or reverting according to flow definition and reversion strategy.
- SUCCESS: all tasks were run to success; if any tasks failed, they were reverted and run again according to reversion strategy.
- REVERTED: one or more tasks failed and flow was reverted to the initial state; all tasks are PENDING again.
- FAILURE: after some task execution or revert failed, flow stuck in "intermediate" state and cannot go on nor revert farther; some tasks are still in SUCCESS or FAILURE states.
Did I get it right?

Gerrit topic: https://review.openstack.org/#q,topic:bp/smart-revert,n,z

Addressed by: https://review.openstack.org/59676
    Get rid of dependency counter in graph action

Addressed by: https://review.openstack.org/64029
    Add make_completed_future to async_utils

Gerrit topic: https://review.openstack.org/#q,topic:bp/subgraph-execution,n,z

Addressed by: https://review.openstack.org/71621
    Flow smart revert with retry controller

Gerrit topic: https://review.openstack.org/#q,topic:checkpoints,n,z

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.