Add generic image transfer layer to nova.virt

Registered by Zhi Yan Liu

Glance now supported adding/removing multiple location information to the metadata of an image, an image maybe have more then one location within the backend store, nova should add a layer to transparent handle the image preparing for the instance with the best approach/location. It should allow Cloud administrator to configure the image handler pipline and its order administrator preferred to the layer.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
Zhi Yan Liu
Definition:
Pending Approval
Series goal:
None
Implementation:
Deferred
Milestone target:
None

Related branches

Sprints

Whiteboard

Summary
=======

There are a few immediate uses of this:

1) Allowing different data transfer mechanisms from a glance backend
to a file - merged in [1] [2] [3]

2) Abstracting out where, in addition to how, an image is transferred
and stored - so future handlers can do more than just downloading to a
file - merged in [4]

3) Implementing alternate image handlers for more efficiency or other
requirements - merged in [5] [6] up for review [7] [8] [9]

Already merged:
[1] https://review.openstack.org/37817
    Add plug-in modules for direct downloads of glance locations.

[2] https://review.openstack.org/40336
    Pass the destination file name to download modules

[3] https://review.openstack.org/42420
    Add context information to download plugins

Merged but then reverted due to problems with glance v2 API:

[4] https://review.openstack.org/33409
    Adding image multiple location support

[5] https://review.openstack.org/59148
    Move libvirt RBD utilities to a new file

[6] https://review.openstack.org/59149
    enable cloning for rbd-backed ephemeral disks

Up for review:

[7] https://review.openstack.org/63975
    VMware: Use the image handler to fetch images

[8] https://review.openstack.org/67606
    VMware: Copy image handler

[9] https://review.openstack.org/73155/
    Adding image multiple location support -- separate configure option

-- jdurgin - added summary and split into sections (Tue Feb 4 14:39:16 PST 2014)

Details on image handlers
=========================

Current Glance support an image can be stored into multiple backend storages, under this feature Nova as the consumer could using image more effectively and follow functions become probable:

1. If one image backend storage failure then Nova can consume image from another one.
2. For different image backend storage, Nova could evaluate the backend model where images location is communicated and using storage particular technology to handle images, such as using read-only volume as the local template image on compute node rather than transferring the actual image bits, this would significantly reduce the data transfer overhead and increase Glance efficiency.
3. Nova should allow administrator to configure the image handling pipline with the order who preferred, for example administrator could require Nova try RBD volume based image first then to File based.

According to above requirement, the implementation design as following:
1. Chnage nova.image.glance.GlanceImageService expose get_locations() as a public interface.
2. Adding nova.virt.images.ImageHandle base class to the Nova common image layer. Currently the behaviour of this the class just only like a image fetcher. On next step, we could implement particular sub-class in relevant hypervisor layer with more advanced functions base on this structure, such as CoW creating and snapshot capturing, etc..
3. Implement nova.virt.images.DefaultImageHandler sub-class to Nova common image layer. As the default handler it using downloading method to fetch image and save to local as a regular file, and use os.unlink to remove image, those like Nova default behaviour.
4. Implement load_image_handlers() entry function, it will be called when particular hypervisor driver initialize, loading construct user configured image handlers from setup.cfg by stevedore.
5. Implement nova.virt.images.handle_image() function, evaluate the schema where images location is communicated and using particular handler to handle that image. It will follow the image handler pipline order which administrator configured.
6. Change nova.virt.images.fetch_to_raw() entry interface function, to require it using new image handler framework via handle_image() interface.
7. To make other parts ok for the change, like nova.virt.libvirt.ImageCacheManager.

btw, a reference may be worth for you: https://etherpad.openstack.org/p/linked-template-image
-- zhiyan

Discussion
==========

My concern with this approach is that a user could easily have an image containing several locations with different files (for example: a location *.ovf and a location *.vmdk or even worse a location *.qcow2 and a location *.jpg). In this case, the *.jpg has nothing to do with the actual image.
Yes, it is always possible to trick the system, however I think this approach opens doors to a lot of corner cases.
I would actually prefer to have an abstraction a level above the image having references to several images.
We have already been discussing that at the summit especially for the OVF case where you have a descriptor-data pair...
-- arnaud (twoputt)

@twoputt, currently Glance don't allow an image has different content for its different location, an image entry only has one checksum field so Glance require all backend/location have same bits.
--zhiyan

Currently Flavio Percoco (flaper87) is working with me on this. Releated blueprint as following. https://blueprints.launchpad.net/nova/+spec/nova-image-zero-copy , https://etherpad.openstack.org/p/nova-glance-zero-copy -- zhiyan

Deferred to icehouse-3 as the blueprint was not approved by the icehouse-2 blueprint approval deadline. --russellb

Feature example instruction
========================

1. Enable image handlers by Nova settings as you needed:
  [DEFAULT]
  image_handlers=libvirt_rbd_clone,vmware_copy,download
2. Assign multiple storage locations to the image via PATCH, e.g.:
  $ curl -X PATCH -H 'X-Auth-Token: 7792c8f4500d4b2d9699135815840b26'
    -H 'User-Agent: python-lient' -H 'Content-Type: application/openstack-images-v2.1-json-patch'
    -i http://ip:9292/v2/images/image-id
    -d '[{"op": "replace", "path": "/locations", "value": [
           {"url": "vsphere://ip/folder/openstack_glance/2332298?dcPath=dc&dsName=ds", "metadata": {}},
           {"url": "rbd://imagename", "metadata": {}}]}]'
3. Enable image locations exposing by Glance configurations:
  [DEFAULT]
  show_multiple_locations = True
  show_image_direct_url = True
4. Provision VM from the image which prepared by above step #2.
5. As a result, if end user are using libvirt hypervisor, the image will be created on compute host by "zero-copy" approach, the VM's ephemeral disk will be cloned from Ceph cluster backend directly instead of download all image bits from Glance by HTTP/RESTful API, this will save a lot of image preparing overhead as we know. And as the same mechanism, if end user are using ESXi or VC hypervisor, the image will be created by the similar smart approach instead of download it from Glance. Because the image has been stored on the same backend storage with VM's ephemeral disk, the data pull out and push back is unnecessarily.

--zhiyan

Gerrit topic: https://review.openstack.org/#q,topic:bp/image-multiple-location,n,z

Gerrit topic: https://review.openstack.org/#q,topic:bug/1226351,n,z

Remaining general patches:
======================
https://review.openstack.org/#/c/73155/

VMware patches to use shared image code:
==================================
https://review.openstack.org/#/c/63975/
https://review.openstack.org/#/c/67606/

Gerrit topic: https://review.openstack.org/#q,topic:bp/vmware-clone-image-handler,n,z

Addressed by: https://review.openstack.org/86583
    Propose: Adding image multiple location support

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.