Readme-Wiki

Registered by Vinod Pandarinathan

CloudPulse - Openstack Health Service

Openstack Health Service
Contents
 [hide]
1 Introduction
2 When is openstack healthy ?
3 Requirements
4 Different type of health checks
5 Application health tests
6 Operator health tests
7 Extensions

Introduction
Cloud applications such as VNFs, VNFMs, NSO have stringent SLA, they need to be highly available with an uptime of 99.9 (99). App availability depends on the cloud infrastructure and hence they need to be aware of health of openstack service. when the infrastructure failure is detected early, these applications can be moved to a different cloud and the cloud operators can be notified . The key take away here is catching and handling failures before customer experiences an application failure.

When is openstack healthy ?
All openstack services receives queries and replies back with an expected result.
Packets can be sent and received on tenant and external network
Requirements
Provide a tool that checks the health of the cloud.

Should be light weight, non disruptive and less resource intensive.
Should provide configurable functional testing
Should verify resource states after openstack upgrade
Should work on all openstack installs, i.e it should be agnostic to openstack distribution and various deployment models.
Should work for both tenants and operators
Provide both CLI and API.
Different type of health checks
1. Operator test

Check all services are running and listening on the ports
Check the cluster status of infra components rabbit and percona (mysql ‘wsrep’ and rabbitmqctl cluster_status)
If Openstack is in HA mode, test the HAProxy and each of the services behind the HAProxy (run 'a' and 'b’)
If pacemaker is installed, use 'crm status' or ‘pcs status'
Requires cloud-admin and operator access.

2. Endpoint test

keystone service-list
glance image-list
cinder list
nova list
neutron net-list
login to horizon page
3. Functional test

Create tenant, create network, upload an image, create two VMs and run ping between the VMs.
Create VM, create volume, attach volume to the VM.
Detach VM, delete volume and delete VM
Clean up all resources
4. Comprehensive health test

Create VM on each compute node and ping the gateway.
Determine max MTU and check jumbo packets (optional)
Check security groups (ping, ssh and http traffic)
5. Upgrade test

Create or snapshot the state of existing openstack resources such as tenants/routers/VMs/Loadbalancers
After upgrade check if the created/snapshotted resources are in operational state
Check security groups after upgrade (ping, ssh and http)

Application health tests
Application can make use of endpoint, comprehensive, functional and upgrade checks. Application can snapshot the resources before upgrade and then check there state after the upgrade. Cloud Pulse itself can be run as a tenant-vm, which then can provide REST-API access to other NFV, VNFM, NFVO applications.

Operator health tests
Operators can install cloud-pulse in one of the controllers directly or using docker container. They should be able to run all of the health checks listed above.

Extensions
CloudPulse is extensible, both operators and API tests can be added to cloud pulse as a pluggable module. Some of the extensions that are of interest at this time are nagios/ganglia for operators and NFVM specific tests for applications.

Blueprint information

Status:
Complete
Approver:
Vinod Pandarinathan
Priority:
High
Drafter:
Vinod Pandarinathan
Direction:
Approved
Assignee:
Vinod Pandarinathan
Definition:
Approved
Series goal:
Accepted for liberty
Implementation:
Implemented
Milestone target:
milestone icon liberty-1
Started by
Vinod Pandarinathan
Completed by
Vinod Pandarinathan

Related branches

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.