Error status for images and image state management

Registered by Alex Meade

Instead of "killing" an image when something goes wrong during data upload, we need a good way to make it obvious the image failed to upload and notify the user. I propose we add an image error state and take a deeper look at image states in general.

There are a couple of use cases where this is important.

- Snapshots through nova
 When a snapshot fails the image could be set to ERROR instead of being deleted/killed. This way users can see that a snapshot failed.

- Copy-from images
 If a copy from image fails during the data copying by the async worker then there is no way for a user to know it failed and it would be unintuitive for the image resource to disappear.

- Failed uploads
 In addition to error responses from Glance when an upload fails, it would be nice for that image resource to not be deleted automatically and have a way for the user to know it had errored. This way uploads could be retried without having to create a whole new image entity.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Feilong Wang
Direction:
Needs approval
Assignee:
Feilong Wang
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Whiteboard

markwash:
  I would like to propose that we address these problems with the addition of a control header in the
  http requests for upload/create image requests. I'm not concerne with what the exact header would
  be, but if provided it should tell the glance server "don't kill this image if the upload fails".

  The question still remains as to how we indicate that the failure occurred.

To Mark,

I am curious that why the killed status is not displayed in glance. If the upload or snapshot fails, we'd better to display the image and show the error message for it in case the end user always try the operation.

dperaza: Does it make sense to include defect 119115 in the scope of this blueprint? Error could happen even before hitting upload from nova in snapshot path, for example while the image is streamed from instance disk. That would change to allow for client side to make the call on update to change state and include error. I also propose that we include the error text as an image reserved property so the client inspecting the image progress sees why the image is now on error.

dperaza:
By the way, killed images do not show when you list image but they do show if you directly query details by ID, so if you save your url before the error state you can always check status in that specific image even if it goes to kill, so even if we decide not to add a new state we should at least add and error property with the why

flwang:
@dperaza, I'm working on this bp which based on the discussion between ameade and I since I was working on an internal bug (146349). After discussed with ameade and markwash several times, we prefer to add "task" to track the image create action, and I think the "task" maybe used to track other actions as well. You can refer this link about the "task" proposal: https://blueprints.launchpad.net/glance/+spec/task-of-image-action

dperaza:
@flwang, after reading your blueprint I still don't think it will handle bug 1191115. Both bp image-error-state-management and bp task-of-image-action handle error path during image upload. In my case there is an error at client side and image will never be uploaded. I think the key to solve issue I see in nova snapshot is to differentiate between a "still waiting for image" state and "image will never come" state so that clients can ignore or removed queue images in glance

To summarize, the use case here is tracking snapshot and snapshot failures. This is a task for nova (pun intended!). We do not want to track "error" states on images. An image should not be in an error state--it should either exist and be active, or not exist. Rather any errors should be tracked as part of a resource that models the act of creating the instance, such as import.
markwash rejected 2014-02-15

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.