Inplace rebuild for instances with a numa topology

Registered by sean mooney on 2019-10-08

Rebuild is an action that is commonly used to upgrade a stateless workload to a new revision of an application.
 In this model, a user uploads a new image containing the new VNF or other applications to glance and then without changing other parameters on the image, i.e. image metadata. they rebuild the instance to complete the upgrade.
In such a deployment it is not uncommon to orchestrate this using heat or ansible.
when we do a rebuild we use a noop claim and therefore do not change the resources claimed
 on the host. to enable in-place rebuilds with a numa topology it is proposed that we first address
 bug: https://bugs.launchpad.net/nova/+bug/1763766 (nova needs to disallow resource consumption changes on image rebuild)
 by implementing an explicitly check in the API to reject rebuild requests where the numa topology would change altering the
 resource consumption and then address bug https://bugs.launchpad.net/nova/+bug/1804502 (
 Rebuild server with NUMATopologyFilter enabled fails (in some cases) ) by skipping the numa topology filter on rebuild once we have asserted the resource consumption will not change.

This blueprint is introduced to track the support for in-place numa rebuild feature but a spec is not
 filed as the proposal is to add support of the feature by resolving the previously stated bugs.
 I assert that the resolution to bug https://bugs.launchpad.net/nova/+bug/1763766 which
 will block rebuild that would ignore the numa topology request until a move operation is performed on the
 the instance should be back portable as it is blocking broken behavior but the new functionality enabled by skipping
 the numa topology filter selectively on a rebuild that addresses https://bugs.launchpad.net/nova/+bug/1804502 is likely
 not in line with the stable policy as it is, in fact, a new feature.
That said for full transparency I have been requested to backport this downstream to queens by a customer which would be allowed by the downstream policy if and only if the feature is accepted upstream.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
sean mooney
Direction:
Needs approval
Assignee:
sean mooney
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Whiteboard

Gerrit topic: https://review.opendev.org/#/q/topic:bug/1763766

Addressed by: https://review.opendev.org/687957
    block rebuild when numa toplogy changed

If these are bugs just track them as bugs, not a blueprint, assuming the bugs aren't more like "lack of a feature is not a bug". You generally can't backport changes related to a blueprint to stable branches and if these are bug fixes I'm assuming you'd want the option to backport the fixes. -- mriedem 20191017

[mriedem 20191017] We talked about this in the nova meeting today:

http://eavesdrop.openstack.org/meetings/nova/2019/nova.2019-10-17-14.00.log.html#l-166

But have not seemed to reach a consensus on whether or not this should be a blueprint or if the bugs should just be fixed. I think one of the tricky questions is if people have gotten used to the behavior of rebuild + new image + new topology and then migrating the server to match the topology. But there is also some precedent for outright failing rebuild operations that don't honor the request, e.g. bug 1482040.

Addressed by: https://review.opendev.org/689861
    Disable NUMATopologyFilter on rebuild

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.