NVMeoF RAID volume healing agent

Registered by Zohar Mamedov on 2020-11-02

The agent is responsible for self healing of NVMeoF client RAID volumes.

It will initially support block storage exposed by KumoScale volume driver used by NVMeoF connector. It should be implemented such that support can be added for any other block storage that makes use of RAID on the client.

* Monitor RAID device states
* Informs volume provisioner of degraded devices, and waits for new replicas
* Detaches degraded replicas and attaches newly provisioned replicas to volume RAID array

Overview in these slides + diagram in slide 12
https://docs.google.com/presentation/d/18zBxXfDTOieuD-lQmz4Cx2xBiTErjfXy9eRsMuYZxJI/edit#slide=id.ga3ee8ff6c3_0_29

Blueprint information

Status:
Started
Approver:
Gorka Eguileor
Priority:
Low
Drafter:
Zohar Mamedov
Direction:
Approved
Assignee:
Zohar Mamedov
Definition:
Approved
Series goal:
Accepted for xena
Implementation:
Good progress
Milestone target:
milestone icon xena-1
Started by
Brian Rosmaita

Related branches

Sprints

Whiteboard

This is a new os-brick connector, and it will also handle replacements on the connect_volume method when the volumes presented by the connection_info are not available.

This is useful because the connection_info stored in Cinder is not updated, but all the volumes it references may have been replaced over time, which would make an attach on instance reboot fail.

Addressed by: https://review.opendev.org/c/openstack/os-brick/+/768576
    Add NVMeOF monitoring and healing agent

Open issues we need feedback the most on please:

1. Starting of the agent thread / process / service(?)
Comment inline on the current implementation (nvmeof_agent.py L96)

2. Agent process persistence (also related to #1)
We would like the agent to keep running after a service (such as nova cpu) has been restarted.
I would love a discussion on the architecture here in general.

3. Backend specific rest client dependency
Currently we just dumped the rest client lib right in the same directory, this is likely not a great practice.
We are working on uploading the rest client to pypi repository, would we be able to add this package as a dependency only for deployments using this nvmeof agent?

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.