Volume migration improvement

Registered by Vincent Hou

As we discussed in one design session for Liberty, there are several work items to do in order to make the current migration more stable and robust. The detailed information can be found via https://etherpad.openstack.org/p/volume-migration-improvement.

The work items can be summarized as below, and I would like to implement them for L:
1) Volume state:

    *Volume status should indicate the progress of the migration, not just migration_status to indicate. It is very confusing that the status of the volume is left ‘available’ and ‘in-use’ during migration. Status should be set to "migrating" during migration. When it is over, set it back to its original status.

    *Return message for the cinder client:

    If the migration command is issued, the user should receive a message from the cinderclient saying either this migration won't proceed because of this or that, or your requested will be processed.

    *Destination volume necessary to be seen by the user? additional flag keep it somewhere in the table

    The target volume is created on the target host. It can be seen by the user via "cinder list", but the user actually cannot do nothing to it. It will finally go away no matter the migration is going to succeed or fail. IMO, not necessary to see the destination volume in cinder list.

    *Error state back: how do we know it is success or fail.
    add a 'get_migration_status' command that takes a volume name and returns the status of the last migration.

2) Proposal to go through the notification system perhaps?

    This is a problem that all the projects are facing right now.

3) Volume status recovery and progress indication:
 Migration could fail at any step, it is good to have the volume back to the original state automatically. Otherwise, we will suffering from "unable to do anything to the volume", e.g. unable to delete a volume with migration_status "completing". migration_status can be used to record the status of the previous migration status. 'get_migration_status' command is used to query the status of the migration. "error" means it failed for the previous time. "success" means it succeeded. "null" means no migration is done before. The status of the volume should be either in-use or available.

Progress indication:
dd command: we can use "sudo kill -USR1 [dd process]" to check the percentage of the data transferred. The progress can be put in log information.
driver specific migration: need to take one driver like Storwize V7000 as an example to see if it supports progress status of the migration.

4) Efficient volume copy:

    It sounds like sparse DD resolves the performance issues in some/or many cases.

    Need to better understand the options here.

    doing small blocksize copies will be effective at discarding, but slow in copying. large blocksizes would be smaller, but will keep more zeroes.

    or does the dd option check individual 4kB blocks before writing?
Related BP link: https://blueprints.launchpad.net/cinder/+spec/efficient-volume-copy-for-cinder-assisted-migration

Blueprint information

Status:
Started
Approver:
Mike Perez
Priority:
High
Drafter:
Vincent Hou
Direction:
Approved
Assignee:
Vincent Hou
Definition:
Approved
Series goal:
None
Implementation:
Needs Code Review
Milestone target:
None
Started by
Mike Perez

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:migration-improvement,n,z

Addressed by: https://review.openstack.org/186327
    Volume migration improvement for L

Addressed by: https://review.openstack.org/186312
    Volume status management during migration

Addressed by: https://review.openstack.org/195443
    Add tempest tests for volume migration

Addressed by: https://review.openstack.org/204953
    Adds the migration progress support for migration

Addressed by: https://review.openstack.org/#/c/189547/
    Change cinderclient according to volume migration improvement

Gerrit topic: https://review.openstack.org/#q,topic:bp/migration-improvement,n,z

Addressed by: https://review.openstack.org/207754
    Adds migration abortion

Addressed by: https://review.openstack.org/210237
    WIP: Use cinder internal tenant to create the target volume

Addressed by: https://review.openstack.org/214941
    Update the devref for volume migration

<jdg> Sadly we didn't get enough teamwork behind this I don't think and given we've hit feature freeze and the number of outstanding patches with -1 etc I don't think it's realistic to think we're going to merge these tonight or tomorrow. I'm afraid we should continue this effort in M and maybe pull together as a team to make it happen early in the first milestone.

Addressed by: https://review.openstack.org/250220
    Migration: take the local_path for the source volume

Addressed by: https://review.openstack.org/250586
    Migrate the snapshot or the clone instead of the volume

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.