Resource usage will not be updated when suspending instance

Bug #1402502 reported by eminjin
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Wishlist
eminjin
openstack-manuals
Fix Released
Medium
Liyingjun

Bug Description

Suspending instance will not update resource usage.

Instance suspending should move all contents in the ram to hard disk.
Then vcpu and used memory should be decreased and hard disk useage should be increased.
However it didn't happen.

This will lead to trouble in the following scenario:
When the memory of all compute nodes are exhaust, to create a new instance, it's useless to suspend some alive instances, but have to delete them.

eminjin (ming-jin)
summary: - Suspending an instance will not release occupied resource
+ Resource usage will not be updated when suspend a instance
summary: - Resource usage will not be updated when suspend a instance
+ Resource usage will not be updated when suspending instance
Revision history for this message
melanie witt (melwitt) wrote :

Hi eminjin,

The behavior you describe is not a bug -- resources are not freed when you suspend an instance.

The feature in nova that most resembles what you're looking for is 'nova shelve', please see doc:

http://docs.openstack.org/user-guide/content/shelve_server.html

Does this address your concern? Or would you like this launchpad report to be a feature request for freeing resources on suspend?

Changed in nova:
status: New → Incomplete
tags: added: compute
Revision history for this message
eminjin (ming-jin) wrote :

Hi, Melanie,

From the description of "suspend", I think the occupied resource should be released:
"""When you suspend an instance, its VM state is stored on disk, all memory is written to disk, and the virtual machine is stopped. Suspending an instance is similar to placing a device in hibernation; memory and vCPUs become available to create other instances."""

http://docs.openstack.org/user-guide/content/suspend_resume.html

Imaging that there is only one compute node with 1 vcpu, and the "cpu_allocation_ratio" was set to "1.0".
Firstly you started one VM with 1 vcpu used. After a while you suspended it.
Since the resource usage are not updated, it will be failed to create a new VM.

Obviously, it violates the descriptioin "memory and vCPUs become available to create other instances".

Best Regards!
Jin Ming

Revision history for this message
melanie witt (melwitt) wrote :

Hi Jin,

I researched into this based on your comment and found some interesting things. From a technical, virt-level standpoint, you are right that resources on the hypervisor can be freed after suspending an instance. I think it's hypervisor dependent, but in the case of libvirt, I did observe that the hypervisor nova-compute.log showed vcpus decrease by 1 when I suspended a tiny instance.

However, I still could not schedule an extra instance on the hypervisor with cpu_allocation_ratio = 1.0 like you said. I found this is because the nova scheduler (which uses the resource tracker) computes resource usage based on data like, how many instances are on the hypervisor, what size are they, etc and does *not* use the values reported by the hypervisor. There are likely many reasons for this, some of which I think are if you scheduled the extra instance while one was suspended, and you resume the suspended one, you can immediately get into an overcommit situation you didn't intend, unless you migrate the extra instance, etc. Another reason might be race conditions between what the nova db knows and what the hypervisor sees.

So, it appears the bug here is in the user documentation, unfortunately. I will update this bug report as a Wishlist item for desired change in behavior, and add the documentation project so that can be fixed.

Changed in nova:
importance: Undecided → Wishlist
status: Incomplete → Opinion
tags: added: scheduler
Revision history for this message
Tom Fifield (fifieldt) wrote :

Triage Note: To fix this, find where http://docs.openstack.org/user-guide/content/suspend_resume.html lives and update the paragraph highlighted above.

Changed in openstack-manuals:
status: New → Triaged
importance: Undecided → Medium
milestone: none → kilo
Liyingjun (liyingjun)
Changed in openstack-manuals:
assignee: nobody → Liyingjun (liyingjun)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-manuals (master)

Fix proposed to branch: master
Review: https://review.openstack.org/142302

Changed in openstack-manuals:
status: Triaged → In Progress
Revision history for this message
eminjin (ming-jin) wrote :

Hi, Melanie,

"""There are likely many reasons for this, some of which I think are if you scheduled the extra instance while one was suspended, and you resume the suspended one, you can immediately get into an overcommit situation you didn't intend"""

I agree with you. I think, besides refreshing resource usage after "suspending", resource tracker should also be invoked before "resuming". In case "resume" action will lead to an overcommit situation, either an error code should be given, or an automatically migration should be performed.

I noticed that you set "Importance" field of "OpenStack Compute (nova) " as "Wishlist", does it mean that this will be improved in future?

"suspend" function is very important for us, as suspended instance can be resumed very quickly, as well as saving resources.
Different from "shelve", suspended instance will not be shutdown.
But if the resource usage didn't refresh after "suspend", why not leave it running? Since the released resource can't be used by other new instances.

Best Regards!
Jin Ming

Revision history for this message
melanie witt (melwitt) wrote :

Hi Jin,

I agree, there are many options for how to handle situations that will arise if we add more update points to the resource tracker such as raising errors and doing automatic migrations.

The Wishlist field is for us to track items that are desired changes to the existing behavior of nova. It means an item might be improved in the future, or it might not. It depends on the outcome of discussions between users, operators, and devs as the work on nova is a community effort.

Your use case makes sense, and I'm sure many others have the same need. I suggest that you join in the community discussion by posting to the mailing lists [1] and chatting in the IRC channels [2]. It's the best way to learn why the design is the way it is, and how it can be changed in the future. I hope it helps.

[1] https://wiki.openstack.org/wiki/Mailing_Lists
[2] https://wiki.openstack.org/wiki/IRC

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (master)

Reviewed: https://review.openstack.org/142302
Committed: https://git.openstack.org/cgit/openstack/openstack-manuals/commit/?id=ea465673461d99f44491afb7f29852acd2a637e1
Submitter: Jenkins
Branch: master

commit ea465673461d99f44491afb7f29852acd2a637e1
Author: liyingjun <email address hidden>
Date: Wed Dec 17 09:52:22 2014 +0800

    Suspending instance will not update resource usage

    As reported in the bug, "memory and vCPUs NOT become available to
    create other instances when suspending instance". So update the
    description for ' Suspend and resume an instance'.

    Change-Id: I4c11c73fe9ba42e77c86cefe4360ce2496e810e1
    Closes-bug: #1402502

Changed in openstack-manuals:
status: In Progress → Fix Released
Changed in nova:
assignee: nobody → eminjin (ming-jin)
status: Opinion → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/148226

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Ming Jin (<email address hidden>) on branch: master
Review: https://review.openstack.org/146741
Reason: Branch problem.
Move to https://review.openstack.org/148226

Revision history for this message
melanie witt (melwitt) wrote :

Patch was abandoned..

Changed in nova:
assignee: eminjin (ming-jin) → nobody
status: In Progress → Opinion
Revision history for this message
melanie witt (melwitt) wrote :

Jin, I didn't see that you had opened a new review and abandoned the original. In the future, please update your existing review instead of creating new reviews.

Another note, as you have probably seen in the code, it appears that memory and disk are returned for scheduling periodically when an instance is suspended, however vcpus are not.

Changed in nova:
assignee: nobody → eminjin (ming-jin)
status: Opinion → In Progress
Revision history for this message
eminjin (ming-jin) wrote :

Hi,

New review request https://review.openstack.org/148226 due to branch problem.

"vcpus" update related code is in the file nova/compute/stats.py.
It follows the logic in the design base.

Best Regards!
Jin Ming

Revision history for this message
melanie witt (melwitt) wrote :

Jin, I understand. I'm just saying that if you use the same Change-Id in your commit message, after you fixed the branch issue, you can keep the same review you had (with a new patch set) instead of creating a new review. The reason we want to avoid new reviews is because it's confusing with duplicates and makes it harder for people to find your active review.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Fix proposed is in progress - https://review.openstack.org/#/c/148226/

Changed in nova:
status: In Progress → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by John Garbutt (<email address hidden>) on branch: master
Review: https://review.openstack.org/148226
Reason: this seems to have been abandoned, pressing the button so its not stuck in the queue still. Please do restore this if thats not the case.

Revision history for this message
Chris Dent (cdent) wrote :

Proposed change is very old, abandoned and has a -1, there hasn't been subsequent feedback so abandoning the bug.

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-manuals 15.0.0

This issue was fixed in the openstack/openstack-manuals 15.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.