Enable cleaning in undercloud Ironic

Registered by Dmitry Tantsur

The cleaning process in Ironic ensures that the node is in a sane state before it can become available for next deployments. It runs before the nodes becomes "available" for the first time, and then after every tear down. As it might be time consuming, we've had it disabled in TripleO for quite some time.

However, this has at least one very nasty consequence: without cleaning partitioning on all disks, deployment can end up randomly reusing partitions from previous deployments. E.g. I've encountered a bug when a deployment ended up with two config drive partitions. I'm not sure if we should get back to upstream defaults and enable cleaning, but we at least need it well supported and documented.

Rough actions items:
1. Ensure enrolling workflows are ready for manageable->available transitions to take minutes, not seconds. Document this as a normal behaviour.
2. Configure cleaning network to "ctlplane" (requires https://bugs.launchpad.net/ironic/+bug/1614938).
3. Configure Ironic to run only metadata cleaning by default (set ironic::conductor::cleaning_disk_erase to 'metadata'), as full cleaning runs insanely long (up to 1 hour) on virtual machines.
4. Modify one of the CI jobs to run with cleaning.
5. Optional: enable cleaning by default (set clean_nodes to True by default in undercloud.conf)

Blueprint information

Emilien Macchi
Dmitry Tantsur
Dmitry Tantsur
Series goal:
Accepted for ocata
Milestone target:
milestone icon ocata-3
Started by
Dmitry Tantsur
Completed by
Dmitry Tantsur

Related branches



Gerrit topic: https://review.openstack.org/#q,topic:bp/re-enable-cleaning,n,z

Addressed by: https://review.openstack.org/400219
    Only erase disk metadata if automated cleaning is enabled

Gerrit topic: https://review.openstack.org/#q,topic:bp/raid-workflow,n,z

Addressed by: https://review.openstack.org/405379
    Create a workflow for running manual cleaning on nodes

Addressed by: https://review.openstack.org/406197
    Set Ironic cleaning network to ctlplane

CI update: https://review.openstack.org/#/c/408010/

UPD: cleaning causes strange failures in CI. Postponing this subtask until better times.

Addressed by: https://review.openstack.org/418356
    Document using manual_cleaning workflow to wipe hard drives


Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.