Add a nova-audit service for periodic maintenance

Registered by melanie witt on 2020-01-30

Excerpts taken from the spec proposal: https://review.opendev.org/693226

Nova is a distributed system, which means that things fail in strange
ways and data stored across multiple systems gets out of sync with the
actual state of reality. Hosts and instances come and go, along with
network connectivity, the message bus and database. Recently we have
gained a number of "heal $thing" routines that operators can run
either periodically or on demand to synchronize the states of various
services and data stores to resolve or prevent problems.

In most cases, these tasks are idempotent and safe to run even when nothing is wrong.
Operators need a single mechanism for performing these maintenance tasks and
healing activities that can be run periodically in the background with
minimal impact to runtime performance, other than to hopefully fix
problems related to inconsistencies before they become acute enough to
get a human involved.

We already have a number of these maintenance activities codified in
one-shot commands that can be run on-demand once a problem has been
identified. Since most of them are not harmful or overly expensive, we
should be able to run those things periodically to attempt to fix
problems automatically before the operator gets involved.

This spec proposes a new binary called ``nova-audit`` to encapsulate
these tasks. Ideally it should be usable in multiple ways:

- As a singleton daemon that periodically runs tasks at various
  intervals according to their potential impact on the system and
  need.
- As a one-shot "fix stuff" command that can be run from cron or
  otherwise scheduled or executed.
- As a daemon or one-shot command that purely audits potential
  problems, but makes no changes.

A new config section of ``[audit]`` would be added with timers and
default values for each task.

Blueprint information

Status:
Started
Approver:
Sylvain Bauza
Priority:
Medium
Drafter:
melanie witt
Direction:
Approved
Assignee:
melanie witt
Definition:
Approved
Series goal:
Accepted for ussuri
Implementation:
Good progress
Milestone target:
None
Started by
melanie witt on 2020-03-11

Related branches

Sprints

Whiteboard

Spec: https://review.opendev.org/#/c/693226/

[efried 20200214] Spec approved

Gerrit topic: https://review.opendev.org/#/q/topic:bp/nova-audit

Addressed by: https://review.opendev.org/708783
    Move nova-manage db purge to nova-audit

Addressed by: https://review.opendev.org/708784
    Move nova-manage db archive_deleted_rows to nova-audit

Addressed by: https://review.opendev.org/708785
    Move nova-manage cell_v2 discover_hosts to nova-audit

Addressed by: https://review.opendev.org/708786
    Move nova-manage cell_v2 map_instances to nova-audit

Addressed by: https://review.opendev.org/708787
    Move nova-manage placement sync_aggregates to nova-audit

Addressed by: https://review.opendev.org/708788
    Move nova-manage placement heal_allocations to nova-audit

[efried 20200220] Agreed in the Nova meeting to Direction:Approve all Definition:Approved blueprints http://eavesdrop.openstack.org/meetings/nova/2020/nova.2020-02-20-14.00.log.html#l-131

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.