glance allows delete even if cannot be deleted from ceph backend store

Bug #1185851 reported by Edward Hope-Morley
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
glance_store
Fix Released
Medium
Edward Hope-Morley

Bug Description

The following steps *should* reproduce the issue. Basically, if I have a copy-on-write cloned vol from image, glance allows me to delet the image from glance itself even if the image cannot be deleted from backend store thus requiring manual deletion from Ceph cluster.

Upload an image to Glance. This will create a snapshot of the raw image.

    glance image-create --name="testimage" --is-public="true" --disk-format="raw" --container-format="ovf" < precise-server-cloudimg-amd64-disk1.img

Create a cinder volume clone of the image

    nova volume-create --image-id <img-id> --display-name test-vol 4

When can then see that we have a snaphot image in the glance/images pool and a cloned image in the cinder/volumes pool

    rbd -p images ls| grep <img-id> # returns <img-id>
    rbd -p volumes ls| grep vol-<vol-id> # returns vol-<vol-id>

Now if I delete the glance image...

    glance delete <img-id>

I get a failure as expected since the snapshot is in use

    Request returned failure status.
    Traceback (most recent call last):
      File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 383, in handle_one_response
        result = self.application(self.environ, start_response)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
        resp = self.call_func(req, *args, **self.kwargs)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
        return self.func(req, *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 333, in __call__
        response = req.get_response(self.application)
      File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1296, in send
        application, catch_exc_info=False)
      File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1260, in call_application
        app_iter = application(self.environ, start_response)
      File "/usr/lib/python2.7/dist-packages/keystoneclient/middleware/auth_token.py", line 450, in __call__
        return self.app(env, start_response)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
        resp = self.call_func(req, *args, **self.kwargs)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
        return self.func(req, *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 333, in __call__
        response = req.get_response(self.application)
      File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1296, in send
        application, catch_exc_info=False)
      File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1260, in call_application
        app_iter = application(self.environ, start_response)
      File "/usr/lib/python2.7/dist-packages/paste/urlmap.py", line 203, in __call__
        return app(environ, start_response)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
        return resp(environ, start_response)
      File "/usr/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
        response = self.app(environ, start_response)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
        return resp(environ, start_response)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
        resp = self.call_func(req, *args, **self.kwargs)
      File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
        return self.func(req, *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 540, in __call__
        request, **action_args)
      File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 557, in dispatch
        return method(*args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/common/utils.py", line 413, in wrapped
        return func(self, req, *args, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 861, in delete
        self._initiate_deletion(req, image['location'], id)
      File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 813, in _initiate_deletion
        safe_delete_from_backend(location, req.context, id)
      File "/usr/lib/python2.7/dist-packages/glance/store/__init__.py", line 257, in safe_delete_from_backend
        return delete_from_backend(context, uri, **kwargs)
      File "/usr/lib/python2.7/dist-packages/glance/store/__init__.py", line 237, in delete_from_backend
        return store.delete(loc)
      File "/usr/lib/python2.7/dist-packages/glance/store/rbd.py", line 284, in delete
        raise exception.InUseByStore()
    InUseByStore: The image cannot be deleted because it is in use through the backend store outside of Glance.
     (HTTP 500)

But the image has been deleted from glance so there is no trace of it other than in Ceph itself

    glance index| grep <img-id> # returns nothing

Now if I delete the cloned volume...

    nova volume-delete <vol>

The rbd image is still around and I have no way of deleting it other than going
into Ceph and manually deleting it.

    rbd -p images ls| grep <img-id> # returns <img-id>
    rbd -p volumes ls| vol-<vol-id> # returns nothing

I suggest we disallow deleting the image from Glance if it is 'in-use'. Perhaps
we could even give the user info on who/what is using the image so they can
resolve dependencies.

Tags: backend
description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Ok how about this for a workaround. When image is deleted in Glance, if the image has clone(s) it will fail (but Glance delete succeed). In this case we could create a flag (object?) to indicate that the image is no longer needed by Glance. Then each time glance performs a delete (and/or other ops?) it can check these 'flags' and attempt to delete the volume if clones/children no longer exist.

Changed in glance:
assignee: nobody → Edward Hope-Morley (hopem)
Changed in glance:
status: New → In Progress
Revision history for this message
Zhi Yan Liu (lzy-dev) wrote :

I'm thinking why not ask cinder ceph driver do full clone on the image volume?

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Zhi, cinder currently leverages the Glance v2 api to allow the user to do fast cloning of glance images when creating volumes. If the user wants to do a full copy and manage the cloning withing cinder they can switch to using the v1 api instead. The value of giving the option to clone from glance rather than force a copy is at least one of performance and capacity saving. The solution proposed here *should* be fairly simple thus allowing to keep supporting this flexible approach for the user.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/55504

Changed in glance:
importance: Undecided → Medium
tags: added: rbd
Revision history for this message
Erno Kuvaja (jokke) wrote :

Edward,

Are you still working on this?

tags: added: backend
removed: ceph glance rbd
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@jokke

This was put on hold following discussions in the glance meetings and I have not had a chance to come back to it. We decided that the proposed solution needing improving to be able to support doing cleanups fully asynchronously instead of current method of using delete event as trigger which could cause unexpected timeout if large amounts of cleanups are required. We would also need to extend the locking logic as it would need to be shared between deletes and cleanups (and more?).

Revision history for this message
Erno Kuvaja (jokke) wrote :

Thanks for the heads up. I was just wondering if this is still existing issue or if the bug is just lingering here.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Gonna have another stab at this. Will first move code to https://github.com/openstack/glance_store/blob/master/glance_store. Instead of using deletes as the trigger for a cleanup we will use the scrubber in order to avoid the possibility of delete requests getting held up/timing out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance_store (master)

Fix proposed to branch: master
Review: https://review.openstack.org/132863

Revision history for this message
Edward Hope-Morley (hopem) wrote :
affects: glance → glance-store
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance_store (master)

Change abandoned by Glance Bot (<email address hidden>) on branch: master
Review: https://review.openstack.org/132863

Revision history for this message
Ian Cordasco (icordasc) wrote :

Edward, do you plan to address the review feedback on https://review.openstack.org/#/c/132863/ ?

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

any progress for this bug?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Edward Hope-Morley (<email address hidden>) on branch: master
Review: https://review.openstack.org/132863

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote (last edit ):

Marking this bug as Fixed Released as this issue is no more reproducible on latest master.

Changed in glance-store:
status: In Progress → Fix Released
Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :

hey Abhishek,

Do you know how this is fixed? do you know any related patches? thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.