Nova changes to support Local Resource Management that uses Resource Management Daemon

Registered by Dakshina Ilangovan

Low latency and real time workloads are sensitive to resource contention on a compute node. As the density of the software that runs on the platform increases, software components like VNFs, containers or infrastructure components like vSwitch/vRouter often compete for hardware resources and this can have significant impact on performance and service availability.

To prevent such negative impact, one solution is to leave some resource headroom while scheduling tasks to a machine. Even this can prove inadequate when there are spikes in load, such as when there are network traffic bursts affecting the Quality of Service (QoS)/Quality of Experience (QoE). Also, with Service Function Chaining, VNFs can be selectively prioritized to deliver an on-demand service. Such decisions require dynamic adjustments to resources allocated to the VNFs to meet the desired performance. To facilitate greater workload packing and handle changes in workload characteristics or priorities, fine grain resource control at sub-second time granularity on the compute node is valuable. Orchestrators today don’t have the ability to provide this on each compute node. For instance, hardware resources like Last Level Cache (LLC), Memory bandwidth, Power management, etc., while being time sensitive and shared across multiple VMs, require dynamic and time-sensitive adjustments at node level leading to the requirement of having a node agent like RMD.

Secondly, since VNFs use a shared platform, hardware partitioning/isolation can be critical. Such partitioning enables each application to get the resources it needs when it needs them, and is not adversely affected by other VNFs. This is sometimes called the “noisy neighbor” problem and RMD aims to address it. For instance, with 5G network slicing, a carrier may want to manage the VNF resource partitions dynamically.

The Resource Management Daemon (RMD) is run on selected servers and bridges this gap by providing the following features.

1. Provide finer-grain local control of resources through short-term monitoring on the local node and ability to make dynamic adjustments to existing allocations consistent to user requests.
2. Provide high-level abstraction of allocation requests of resources on the local platform.
3. Associate Quality of Service (QoS) metadata with resource allocation requests.
4. Provide hooks into platform to address different interfaces for the same or multiple resources. This results in vendor or platform independence from OpenStack.

Currently Nova only provides static allocation of hardware resources when an instance like VM is scheduled with no concept of local node monitoring and fine tuning of a subset of hardware resources that can help improve or maintain the performance. Refer to the Alternatives section for the implications of using a hypervisor like Libvirt alone to manage hardware resources.

OpenStack workloads can benefit from the Resource Management Daemon (RMD) when deployed on a compute node and RMD can extend the capabilities of a hypervisor like Libvirt to provide more than static allocation of resources.

Implications for Nova:

We propose a solution for OpenStack Nova to leverage the Resource Management Daemon (RMD) to manage a subset of host resources traditionally not visible to Nova. These are resources that shall allow more fine grain resource control for more predictable performance. This will require running RMD on the compute nodes and modifying Nova Compute.

This document has two parts. The first part explains integration between OpenStack and RMD and changes required in Nova Compute to obtain resource management support for resources under RMD. The two primary changes of this first part are:

• Enabling a path between Nova Compute and RMD that allows Nova Compute to query the inventory of resources managed by RMD, and to delegate resource allocation requests to RMD. The OpenStack flavor determines as always eligible hosts and the amount and types of the various resources being requested for an instance.
• Enabling a path from Nova Compute to Placement API to update resource inventory information obtained from RMD.

The second part outlines with examples the described OpenStack RMD integration changes to leverage two latency-critical hardware resources enabled by RMD, namely Last Level Cache (LLC) and Memory Bandwidth.

It should be noted that RMD is intended to expand to more resources than the two resources listed here.

Blueprint information

Status:
Started
Approver:
None
Priority:
Undefined
Drafter:
Dakshina Ilangovan
Direction:
Needs approval
Assignee:
Dakshina Ilangovan
Definition:
Drafting
Series goal:
None
Implementation:
Started
Milestone target:
milestone icon ongoing
Started by
Dakshina Ilangovan

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/nova-changes-to-support-local-resource-management-that-uses-resource-management-daemon,n,z

Addressed by: https://review.openstack.org/630817
    Nova changes to support Local Resource Management that uses Resource Management Daemon

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.