Handling a Down Cell

Registered by Surya Seetharaman

(1) Nova should be able to list instances and services even if it cannot connect to the DB of a single cell which is not the current scenario.

(2) Similarly, quota calculations for the VM creations today ignore resources in a down cell which is wrong. A plausible solution for this as discussed in the PTG is to block VM creations for those users who have VMs in the down cell and allow creations otherwise.

However for both the above scenarios to work, we would need the information regarding which instances are deleted and which are not, in the nova_api DB so that this info can be used when a cell goes down. This Spec proposes to add a new 'queued_for_delete' column to the instance_mapping table for handling nova list, nova service-list and VM creation (with correct quota) operations.

Blueprint information

Status:
Complete
Approver:
Matt Riedemann
Priority:
High
Drafter:
Surya Seetharaman
Direction:
Approved
Assignee:
Surya Seetharaman
Definition:
Approved
Series goal:
Accepted for stein
Implementation:
Implemented
Milestone target:
milestone icon stein-3
Started by
Matt Riedemann
Completed by
Matt Riedemann

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/handling-down-cell,n,z

Addressed by: https://review.openstack.org/557369
    Handling a down cell

Addressed by: https://review.openstack.org/566788
    [WIP] Add queued for delete to instance_mappings table.

Addressed by: https://review.openstack.org/566795
    [WIP/POC] Add queued_for_delete field to InstanceMapping object

Addressed by: https://review.openstack.org/566813
    [WIP/POC] Updating queued_for_delete from instance_destroy()

Gerrit topic: https://review.openstack.org/#q,topic:bug/1726301,n,z

Addressed by: https://review.openstack.org/567785
    [POC] Graceful handling of nova-list when a cell is down

Addressed by: https://review.openstack.org/575734
    Make nova list ignore down cells

Addressed by: https://review.openstack.org/575996
    [Ugly POC] Addition of "unavailable_servers" key to GET /servers response

Approved post spec-freeze as a high priority item for Rocky. -- mriedem 20180709

Addressed by: https://review.openstack.org/581243
    Fix nits in the handling down cell spec

Addressed by: https://review.openstack.org/582536
    Online migration tool for populating queued-for-delete

Addressed by: https://review.openstack.org/584504
    Online data migration for queued_for_delete flag

Gerrit topic: https://review.openstack.org/#q,topic:bp/bp,n,z

We're past feature freeze for Rocky, so this must be deferred. Please re-propose the spec for Stein if you'd like to work on it next cycle. -- melwitt 20180727

Addressed by: https://review.openstack.org/591656
    Add get_by_cell_and_project() method to InstanceMappingList

Addressed by: https://review.openstack.org/591657
    API microversion bump for handling-down-cell

Addressed by: https://review.openstack.org/591658
    Return a minimal construct for nova show when a cell is down

Addressed by: https://review.openstack.org/584829
    Return a minimal construct for nova service-list when a cell is down

Addressed by: https://review.openstack.org/592698
    Batch results per cell when doing cross-cell listing

Addressed by: https://review.openstack.org/593717
    List instances from all cells explicitly

Addressed by: https://review.openstack.org/593131
    Make instance_list perform per-cell batching

Addressed by: https://review.openstack.org/594265
    Record cell success/failure/timeout in CrossCellLister

Addressed by: https://review.openstack.org/594570
    Make CELL_TIMEOUT a constant

Addressed by: https://review.openstack.org/594571
    Stash the cell uuid on the context when targeting

Addressed by: https://review.openstack.org/594572
    Make RecordWrapper record RequestContext and expose cell_uuid

Gerrit topic: https://review.openstack.org/#q,topic:failed-cell-list,n,z

Addressed by: https://review.openstack.org/594947
    [WIP] Add scatter_gather_single_cell utility

Addressed by: https://review.openstack.org/595892
    Handling a down cell

Re-approved for Stein. -- mriedem 20180824

Gerrit topic: https://review.openstack.org/#q,topic:bp/api-extensions-merge-stein,n,z

Addressed by: https://review.openstack.org/596285
    Merge extended_volumes extension response into server view builder

Addressed by: https://review.openstack.org/592428
    Making instance/migration listing skipping down cells configurable

Addressed by: https://review.openstack.org/607663
    Modify get_by_cell_and_project() to get_not_qfd_by_cell_and_project()

Addressed by: https://review.openstack.org/607934
    [WIP] Refactor scatter-gather utility to return exception objects

Addressed by: https://review.openstack.org/611665
    Make CellDatabases fixture reentrant

Addressed by: https://review.openstack.org/569055
    [WIP] Make _instances_cores_ram_count() be smart about cells

Addressed by: https://review.openstack.org/614783
    [WIP] Add os_compute_api:servers:create:cell_down policy

Addressed by: https://review.openstack.org/614810
    Add DownCellFixture

Gerrit topic: https://review.openstack.org/#q,topic:bug/1771810,n,z

Addressed by: https://review.openstack.org/635120
    Modify InstanceMappingList.get_not_deleted_by_cell_and_project()

Addressed by: https://review.openstack.org/635121
    Plumbing for ignoring list_records_by_skipping_down_cells

Addressed by: https://review.openstack.org/635145
    Plumbing for allowing the all-tenants filter

Addressed by: https://review.openstack.org/635146
    Plumbing required in ViewBuilder to construct partial results

Addressed by: https://review.openstack.org/635147
    API microversion 2.68: Handles Down Cells Documentation

Addressed by: https://review.openstack.org/637182
    Add context.target_cell() stub to DownCellFixture

Addressed by: https://review.openstack.org/638173
    [Doc Fix]Best practices for effectively tolerating down cells

Addressed by: https://review.openstack.org/650167
    Add testing guide for down cells

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.