Improve performance of unshelve instance

Registered by Abhishek Kekane

The aim of this feature is to improve the performance of unshelve instance by eliminating downloading/copying snapshot time. All instance files will be retained in the instance store backed by shared or non-shared storage on the compute node when an instance is shelved.

When you unshelve hundreds of instances at the same time, instance spawning time varies and it mainly depends on the size of the instance snapshot and the network speed between glance and nova servers.

If you have configured file store (shared storage) as a backend in Glance for storing images/snapshots, then it's possible to improve the performance of unshelve instance dramatically by configuring nova.image.download.FileTransfer in nova. In this case, it simply copies the instance snapshot as if it is stored on the local filesystem of the compute node. But then again in this case, it is observed the network traffic between shared storage servers and nova increases enormously resulting in slow spawning of the instances.

Use Cases
----------
1. Service provider wants to use capacity of shelved instances for launching new instances on the shelved compute host.
2. User wants to use public images provided by the service provider for launching new instances.

Today, the #1 use case for the service provider is implemented by configuring “shelved_offload_time” parameter to 0 in nova.conf. This frees up cpu/memory/disk resources for use by other instances on the compute node, but may lead to slower unshelve times for instance if the instance is booted from image as it involves downloading of large snapshot from glance.

If we want unshelving instance to be fast, then it is possible today if you have shelved instance booted from volume. But in order to reach to this point, each of the tenant first needs to copy the public image to volume, take a snapshot of it and finally use that snapshot (internally it will create another volume from it) for launching new instances. This is a very tedious and time consuming process.

This is the main motivation why we want to improve the performance of unshelving instances booted from image.

Blueprint information

Status:
Started
Approver:
None
Priority:
Undefined
Drafter:
Abhishek Kekane
Direction:
Needs approval
Assignee:
Abhishek Kekane
Definition:
New
Series goal:
None
Implementation:
Good progress
Milestone target:
None
Started by
Abhishek Kekane

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/improve-unshelve-performance,n,z

Addressed by: https://review.openstack.org/135387
    Improve performance of UnShelve API

Addressed by: https://review.openstack.org/184871
    POC: Improve performance of Unshelve api

(?)

Work Items

Work items:
Move image creation from compute api to manager: DONE
Add new HostAggregateGroupFilter scheduler filter: DONE
Copy instance files from source to dest node: DONE
Improve performance of Unshelve api: DONE

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.