Cluster Creation and Management

Registered by Pushkar Acharya

Detecting host failures in OpenStack deployment is important to achieve following use-cases --
1. VMs running on a failed host can be evacuated to minimize disruption.
2. OpenStack services impacted by failed host can be restarted on healthy nodes.

Distributed concensus is an effective mechanism for detecting host failures and restore service disruptions. In addition, it has other applications such as
1. Detecting network partitions in a datacenter environment.
2. Provides a liveness detection for VMs and hosts which can be used by orchastration services in OpenStack. This method is much quicker than OpenStack based heartbeach mechanisms.
3. Can provide a distributed Key-Value store along with service discovery which can be used for liveness detection of OpenStack services.

Creating and monitoring a cluster across multiple hosts for detecting host failures and network partitions can be a difficult task and involves manual intervention. If the host failures are permanent the cluster needs to be reconfigured according the available hosts manually.
This blueprint proposes a new service that can setup, monitor and reconfigure such distributed consensus clusters. In summary --
1. This service will setup and monitor the cluster on the selected hosts and attempt to address the problems mentioned earlier.
2. As hosts are added/removed from OpenStack environment, it will automatically re-establish the distributed quorum.
3. In case of failures, it removes and fences out defective nodes until the problem is fixed.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Pushkar Acharya
Direction:
Needs approval
Assignee:
None
Definition:
Drafting
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.