Change Default Overcommit values for CPU and MEM

Registered by Yuriy Shyyan

The current default over commit values imply a very specific and optimized use case of high density hosting.

https://docs.openstack.org/arch-design/design-compute/design-compute-overcommit.html

Further more CTOs and Technical Leads oriented at density will look at the over commit values and assume these defaults are sane with no further optimization requirements and will work out monetary and bare metal planning based on that density. In reality these values are not accurate in practical implementation and heavily depend on the use case.

Memory over-commit has been by far the most risky over-commit. Most of the cloud operators I've run into instantly tell me it is the first setting they change back to 1:1 if not lower. Running out of memory may start killing workloads, or if not protected will kill off converged services, and may result in node downtime all together. Memory over-commit may require implementations of KSM and other balloon memory management tooling and requires a consistent, low to none workload to function correctly.

The proposed changes were approved in a nova meeting:

Change CPU default from 16 -> 4 (4:1)
Change MEM default from 1.5 -> 1 (1:1)

Ideally the documentation portion is restructured on how to scale up these over commits to match the desired use case, density and performance and start at sane values/defaults. Engineers can then template out the necessary config values after they've gotten to know system capabilities. Instead of working backwards from chaos and mayhem, giving future admins the opportunity to reach desired state through scaling up from a stable system should be the goal of arch design.

The other stressors in the documentation portion of this particular piece should be the reserved system resources. It's getting more common to run hyper-converged infrastructure, monitoring services, distributed storage alongside the compute, not the most desirable but a reality. While common sense, we should encourage operators to identify maximum system needs, reserve the required resources to not into further issues somewhere in the over commit documentation, as they heavily conflict against each other. I feel like a brief one line mention of the reserved variable names with a link to https://docs.openstack.org/nova/latest/configuration/config.html should suffice.

Blueprint information

Status:
Not started
Approver:
Sylvain Bauza
Priority:
High
Drafter:
Yuriy Shyyan
Direction:
Approved
Assignee:
Yuriy Shyyan
Definition:
Approved
Series goal:
Accepted for yoga
Implementation:
Deferred
Milestone target:
None

Related branches

Sprints

Whiteboard

[20211104 bauzas] Approved as a specless BP during the Nov 2nd Nova meeting
https://review.opendev.org/c/openstack/nova/+/830829

[20220225 bauzas] Implementation hit by FeatureFreeze, please repropose the blueprint/spec for the Zed release.

Implementation change : https://review.opendev.org/c/openstack/nova/+/830829

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.