Resumption migrations

Registered by Ivan Melnikov

A simple migration layer can be used to upgrade flow or task details to newer versions which will be useful for pre-engine activation to ensure that the data that is associated with a task or flow is compatible.

Blueprint information

Not started
Joshua Harlow
Needs approval
Series goal:
Milestone target:

Related branches



Proposed change:

-- imelnikov: --------------------------------------------------------------------------------
Review above proposes version to be added to flows manually, but for me
it does not look like best approach.

Subflows make it possible to split flow creation into several parts, and tasks
may be separated from flows. This way, versions at the flow itself become
almost unmanagable.

When I think about it, I tend to prefer flow versions be contents based. That
might be git-like approach, when we calculate hash for all needed information,
or straighforward approach when we just save all needed information, or both
of them.

From my current point of view, only information that is relevant in such use
case is set of tasks, which may be represented as set of tuples

    (task_name, task_version)

This information can be retrieved from storage -- it is already saved.

If we agree that said set of such tuples should be all information we need for
migration, then it becomes obvious that only few types of inconsistencies
are possible:

- old task removed: there is task with given name in storage, but not in flow;
- new task added: there is task with given name in flow, but not in storage;
- task version change: there is task with given name both in flow and in
  storage, but versions differ.

We may provide interface to register callbacks that will be called when
certain inconsistency detected. Migration then may operate as follows:

1. find any inconsistency
    - look for removed tasks first
    - then for added
    - then for changed versions
   in fact, any order will do imo, but it should be well-defined
2. if found, call appropriate callback(s)
    - callback(s) alter storage to resolve inconsistency;
    - they may resolve several inconsistencies at once;
3. go to step 1


1. Would the flow 'version' then become just a sha1/md5 of all the contained tasks tuples? Would it be useful to have this at all, or would it just be a way to immediately be able to determine that an 'inconsistency' is likely?

imelnikov: I think having flow version as hash of tasks tuples is not really useful: I doubt we'll ever have really giant flows, so just comparing relevant data directly should be fast and good enough. Also, utility function to do that comparation seems useful.

2. The above review has a concept of steps that can be applied, these seem useful, would there be a concept of steps in your model? Steps in something like alembic seems to be a way to tie a version to its parent, if we have a similar concept, that might be useful. Each step would be tied to a flow 'version' (the above sha1/md5) and its parent. What do u think, to complicated?

imelnikov: Having 'steps' instead of 'callbacks' I described above brings in one good feature: it is easy to ensure consistent ordering of applications. I need to think about it more, maybe I'll come up with some kind of prototype...

UPD: said prototype:

3. The part I am also wondering about is that not only are (task_name, task_version) useful to determine the flow 'content' but also the relations between tasks seem quite useful to know about, but maybe this can be skipped. Thoughts?

imelnikov: With current model changes in task relations do not require anything to be done: we just take new realtions from re-created flow. Maybe there are usecases for migrations based on relations changes or flow metadata changes or just anything else not related to chaged tasks, but I'd rather leave that out of scope of this blueprint.

Gerrit topic:,topic:bp/resumption-migrations,n,z


Work Items

This blueprint contains Public information 
Everyone can see this information.


No subscribers.