Efficiently pin running VMs to physical CPUs automatically

Registered by Giorgio Franceschi

A tool for pinning automatically each running virtual CPU to a physical one in the most efficient way, balancing load across sockets/cores and maximizing cache sharing/minimizing cache misses. Ideally able to be run on-demand, as a periodic job, or be triggered by events on the host (vm spawn/destroy).

Blueprint information

Status:
Started
Approver:
None
Priority:
Undefined
Drafter:
Giorgio Franceschi
Direction:
Needs approval
Assignee:
Giorgio Franceschi
Definition:
Drafting
Series goal:
Proposed for grizzly
Implementation:
Needs Code Review
Milestone target:
None
Started by
Giorgio Franceschi

Related branches

Sprints

Whiteboard

We think it would be beneficial to integrate a tool (codenamed Pinhead) that we developed in-house into OpenStack.

We run a mix of light- and heavy-load nodes with different hardware configurations. We want to make sure that each host runs under an efficient configuration, as far as CPU allocation is concerned. L2/3 cache hits and misses are a concern, as well as VMs roaming around on the host migrating across CPUs, thus introducing execution delays and occasionally latency spikes as internal interrupts, etc. keep getting rerouted around. Network operations seem to be especially vulnerable to this.
Pinhead works by examining the current HW config of the host and gathering information about running VMs (how many, how many vCPUs, etc.). These two pieces of information are then used to devise an optimal allocation strategy; this is then applied to the host by way of libvirt calls. The tool is completely stateless and does everything from scratch on each run; this is not a problem, because HW info gathering is instantaneous and devising the strategy takes just a couple of seconds. Redundant pinning requests are avoided by checking whether the strategy for each VM matches its running settings, in which case the call is skipped. This guarantees that action is only taken when strictly necessary. The stateless nature of the tool also makes it easy to run it either as a cron job (every 15 minutes or so), or on-demand (manually, as desired), or on a nova trigger (exploiting nova hooks for instance spawning and destroying).
In the end, each vCPU in a VM is pinned to one Linux CPU (e.g. one physical or virtual core, depending on hyperthreading availabilty) and the problems mentioned above are alleviated or solved entirely.

Our "optimal strategy" is defined as follows.
We loop over all running domains ordered by number of active vCPUs.
For each VM, we select the socket with the lowest load (i.e. lowest number of already allocated cores) and map as many cores as necessary to the vCPUs, keeping these as close to each other, on a hardware level, as possible. For 2 vCPUs, this usually means 2 threads on the same physical core. 4+ vCPUs -> as many pairs of threads (or cores if hyperthreading is not enabled) as necessary, preferably on the same socket. A single vCPU is pinned to a thread on a core and the other half of the core is left free.
This strategy maximizes cache sharing and optimizes load balancing across sockets and cores.

Current version of the tool is available at https://github.com/hyves-org/pinhead if anyone wants to check it out before integration.

Gerrit topic: https://review.openstack.org/#q,topic:bp/auto-cpu-pinning,n,z

Addressed by: https://review.openstack.org/33811
    Adds pinhead support
We think it would be beneficial to integrate a tool (codenamed Pinhead) that we developed in-house into OpenStack.

We run a mix of light- and heavy-load nodes with different hardware configurations. We want to make sure that each host runs under an efficient configuration, as far as CPU allocation is concerned. L2/3 cache hits and misses are a concern, as well as VMs roaming around on the host migrating across CPUs, thus introducing execution delays and occasionally latency spikes as internal interrupts, etc. keep getting rerouted around. Network operations seem to be especially vulnerable to this.
Pinhead works by examining the current HW config of the host and gathering information about running VMs (how many, how many vCPUs, etc.). These two pieces of information are then used to devise an optimal allocation strategy; this is then applied to the host by way of libvirt calls. The tool is completely stateless and does everything from scratch on each run; this is not a problem, because HW info gathering is instantaneous and devising the strategy takes just a couple of seconds. Redundant pinning requests are avoided by checking whether the strategy for each VM matches its running settings, in which case the call is skipped. This guarantees that action is only taken when strictly necessary. The stateless nature of the tool also makes it easy to run it either as a cron job (every 15 minutes or so), or on-demand (manually, as desired), or on a nova trigger (exploiting nova hooks for instance spawning and destroying).
In the end, each vCPU in a VM is pinned to one Linux CPU (e.g. one physical or virtual core, depending on hyperthreading availabilty) and the problems mentioned above are alleviated or solved entirely.

Our "optimal strategy" is defined as follows.
We loop over all running domains ordered by number of active vCPUs.
For each VM, we select the socket with the lowest load (i.e. lowest number of already allocated cores) and map as many cores as necessary to the vCPUs, keeping these as close to each other, on a hardware level, as possible. For 2 vCPUs, this usually means 2 threads on the same physical core. 4+ vCPUs -> as many pairs of threads (or cores if hyperthreading is not enabled) as necessary, preferably on the same socket. A single vCPU is pinned to a thread on a core and the other half of the core is left free.
This strategy maximizes cache sharing and optimizes load balancing across sockets and cores.

Current version of the tool is available at https://github.com/hyves-org/pinhead if anyone wants to check it out before integration.

Gerrit topic: https://review.openstack.org/#q,topic:bp/auto-cpu-pinning,n,z

Addressed by: https://review.openstack.org/33811
    Adds pinhead support

Marking this blueprint as definition: Drafting. If you are still working on this, please re-submit via nova-specs. If not, please mark as obsolete, and add a quick comment to describe why. --johnthetubaguy (20th April 2014)

(?)

Work Items