Complex (anti-)affinity policies

Registered by Stephen Gordon

To meet typical telecommunication service provider SLAs the failure
of any given host cannot take down more than k VMs in each N+k pool
for a given application. More precisely, given that the pools are
dynamically scaled, it is a requirement that at no time is there
more than a certain proportion of any pool instantiated on the same
host.

An N+k pool is a pool of identical, stateless servers, any of which
can handle requests for any user. N is the number required purely
for capacity; k is the additional number required for redundancy.
k is typically greater than 1 to allow for multiple failures.
During normal operation N+k servers should be running. Conceptually
in this context a pool is roughly analogous to Nova's concept of a
"server group" though the latter does not support the type of policy
described in this proposal.

Affinity/anti-affinity can be expressed pair-wise between VMs, which
is sufficient for a 1:1 active/passive architecture, but an N+k pool
needs something more subtle. Specifying that all members of the
pool should live on distinct hosts is clearly wasteful. Instead,
availability modelling shows that the overall availability of an N+k
pool is determined by the time to detect and spin up new instances,
the time between failures, and the proportion of the overall pool
that fails simultaneously. The OpenStack scheduler needs to provide
some way to control the last of these by limiting the proportion of
a group of related VMs that are scheduled on the same host.

Blueprint information

Status:
Complete
Approver:
Matt Riedemann
Priority:
Low
Drafter:
Stephen Gordon
Direction:
Approved
Assignee:
Yikun Jiang
Definition:
Approved
Series goal:
Accepted for rocky
Implementation:
Implemented
Milestone target:
milestone icon rocky-3
Started by
Matt Riedemann
Completed by
Matt Riedemann

Related branches

Sprints

Whiteboard

Moving to rocky for discussion; I believe our product team at Huawei has a similar requirement for a type of tolerance/limit/threshold for the number of instances that can be placed on the same host in a soft (anti) affinity policy group. -- mriedem 20180207

Gerrit topic: https://review.openstack.org/#q,topic:bp/complex-soft-anti-affinity-policies,n,z

Addressed by: https://review.openstack.org/546925
    [WIP] Allow Specifying Limit For Soft (Anti-)Affinity Groups

Gerrit topic: https://review.openstack.org/#q,topic:bp/complex-anti-affinity-policies,n,z

Addressed by: https://review.openstack.org/560832
    Add rules column to instance_group_policy table.

Addressed by: https://review.openstack.org/563375
    Add policy to InstacenGroup object and api models.

Addressed by: https://review.openstack.org/563401
    Add policy field to ServerGroup notification object

Addressed by: https://review.openstack.org/567534
    WIP: Microversion 2.63 - Use new format policy in server group

Approved for Rocky. -- mriedem 20180516

Addressed by: https://review.openstack.org/571166
    Change the anti-affinity Filter to adapt to new policy

Addressed by: https://review.openstack.org/571465
    WIP: Adapt _validate_instance_group_policy to new policy model

Addressed by: https://review.openstack.org/573628
    Add InstanceGroupPolicy object

Addressed by: https://review.openstack.org/574240
    Fix all invalid obj_make_compatible test case

Gerrit topic: https://review.openstack.org/#q,topic:bug/1776373,n,z

Addressed by: https://review.openstack.org/579113
    Refactor the policies to policy

Addressed by: https://review.openstack.org/580942
    DNM: use policy create()

Addressed by: https://review.openstack.org/581616
    Address nit in afc7650e64753ab7687ae2c4f2714d4bb78a4e5a

Addressed by: https://review.openstack.org/583434
    Change deprecated policies to policy

The compute API microversion 2.64 and corresponding novaclient change are merged so this is complete for Rocky. -- mriedem 20180718

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.