Readiness and livenss probes for the CNI daemon

Registered by Antoni Segura Puimedon

The CNI daemon is on its way to become the default CNI component in kuryr-kubernetes deployments. The way it is usually deployed is via Daemonsets. Thus, it is increasingly necessary to give kubernetes tools to know when CNI is in a good shape so that it can restart when problems arise. The best way to do that is by means of liveness and readiness checks.

For the CNI daemon to be healthy at least the following must be true:
- NET_ADMIN capabilities present
- Depending on the vif binding, ovs br-int present
- IPDB in working order. It would be nice if somehow we could detect leaks and mark unhealthy if it gets out of hand
- Connection to Kubernetes API for the Watch.
- Probably a configurable maximum of CNI ADD failures should mark the CNI as unhealthy so that it is restarted.

Blueprint information

Status:
Complete
Approver:
Daniel Mellado
Priority:
Undefined
Drafter:
Antoni Segura Puimedon
Direction:
Needs approval
Assignee:
Maysa de Macedo Souza
Definition:
New
Series goal:
None
Implementation:
Implemented
Milestone target:
None
Started by
Antoni Segura Puimedon
Completed by
Antoni Segura Puimedon

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/cni-daemon-readiness-liveness,n,z

Addressed by: https://review.openstack.org/537721
    [WIP] Add readiness and liveness checks to CNI.

Gerrit topic: https://review.openstack.org/#q,topic:health_caps_procfs,n,z

Addressed by: https://review.openstack.org/549276
    cni health: Avoid capsh dependency

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.