Support High Availability Active-Active configurations in Cinder Volume

Registered by Gorka Eguileor on 2015-10-08

Currently Cinder Volume currently only supports High Availability with Active-Passive configuration.

This blueprint proposes a series of changes in the API and Volume nodes to allow it to support Active-Active configurations as well.

Blueprint information

Status:
Started
Approver:
Sean McGinnis
Priority:
Medium
Drafter:
Gorka Eguileor
Direction:
Approved
Assignee:
Gorka Eguileor
Definition:
Approved
Series goal:
None
Implementation:
Good progress
Milestone target:
None
Started by
Sean McGinnis on 2016-03-10

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/cinder-volume-active-active-support,n,z

**REVIEWING ORDER**
==============

Addressed by: https://review.openstack.org/303021
    Make c-vol use workers table for cleanup

Addressed by: https://review.openstack.org/318573
    Attach/detach calls moved to HA A-A

Addressed by: https://review.openstack.org/344225
    Suport A/A in delete operations and get_capabilities

Addressed by: https://review.openstack.org/344226
    Support A/A on Scheduler operations

Addressed by: https://review.openstack.org/346041
    Cosmetic changes to scheduler

Addressed by: https://review.openstack.org/355968
    Add more operations to cluster

Addressed by: https://review.openstack.org/363010
    Allow triggering cleanup from API

Addressed by: https://review.openstack.org/381835
    Make Replication support Active-Active

Addressed by: https://review.openstack.org/386186
    Make Image Volume Cache cluster aware

**SPECIFICATIONS:**
=================

*Merged:*
------------------------

Addressed by: https://review.openstack.org/232599
    Support HA Active/Active configurations

Addressed by: https://review.openstack.org/207101
    Remove Cinder API races

Addressed by: https://review.openstack.org/202615
    Add Tooz locks to support A/A HA spec

Addressed by: https://review.openstack.org/232595
    Job Distribution to support HA A/A

Addressed by: https://review.openstack.org/327283
    Update Job Distribution for A/A Specs

Addressed by: https://review.openstack.org/236977
    Resource cleanup to support HA A/A

Optional: Alternative to DLM locks for drivers that don't require distributed locks
Addressed by: https://review.openstack.org/237602
    Remove Manager Local Locks for HA A/A

*Ready for review:*
-----------------------
Addressed by: https://review.openstack.org/237076
    Auto-fencing to support HA A/A

**API RACES:**
============

*Merged:*
-----------------------

Addressed by: https://review.openstack.org/218012
    Move get_by_id to CinderObject

Addressed by: https://review.openstack.org/205834
    Add atomic conditional updates to objects

Addressed by: https://review.openstack.org/216376
    Improve metadata update operations

Addressed by: https://review.openstack.org/216377
    Remove API races from attach and detach

Addressed by: https://review.openstack.org/205835
    Remove API races from delete methods

Addressed by: https://review.openstack.org/231936
    Add ordering possibilities to conditional update

Addressed by: https://review.openstack.org/216378
    Remove API races on extend and volume_upload_image

Addressed by: https://review.openstack.org/221442
    Remove API races from migrate and retype

Addressed by: https://review.openstack.org/259429
    Remove API races from consistency groups

*Ready for review:*
-----------------------

*Work In Progress*
-----------------------

Addressed by: https://review.openstack.org/257495
    Remove race conditions from transfer API

Addressed by: https://review.openstack.org/255430
    Remove race conditions from backup API

Addressed by: https://review.openstack.org/277588
    Removed potential races from volume update method

**JOB DISTRIBUTION**
=================

*Ready for review:*
-----------------------
Addressed by: https://review.openstack.org/318573
    Attach/detach calls moved to HA A-A

Addressed by: https://review.openstack.org/344225
    Suport A/A in delete operations and get_capabilities

Addressed by: https://review.openstack.org/344226
    Support A/A on Scheduler operations

Addressed by: https://review.openstack.org/346041
    Cosmetic changes to scheduler

Addressed by: https://review.openstack.org/355968
    Add more operations to cluster

Addressed by: https://review.openstack.org/381835
    Make Replication support Active-Active

Addressed by: https://review.openstack.org/386186
    Make Image Volume Cache cluster aware

*Merged*
------------------------------------
Addressed by: https://review.openstack.org/286598
    Refactor sqlalchemy service methods

Addressed by: https://review.openstack.org/318572
    Add cluster table and related methods

Addressed by: https://review.openstack.org/327686
    Update Versioned Objects with Cluster object

Addressed by: https://review.openstack.org/327687
    Add cluster job distribution

Addressed by: https://review.openstack.org/327688
    Update manage with cluster related commands

Addressed by: https://review.openstack.org/327689
    Modify API to include cluster related operations

Cinderclient: Add cluster related commands
    https://review.openstack.org/#/c/327692/

Addressed by: https://review.openstack.org/303020
    Add cleanable base object and cleanup request VO

*Approved, waiting on merge*
-----------------------
None

*Work In Progress*
-------------------------
None

**CLEANUP**
==========

*Ready for review:*
-----------------------

Addressed by: https://review.openstack.org/303021
    Make c-vol use workers table for cleanup

Addressed by: https://review.openstack.org/363010
    Allow triggering cleanup from API

*Pending rebase on the new job distribution mechanism*
------------------------------------------------------------------------------

Addressed by: https://review.openstack.org/303024
    Add auto cleanup mechanism

Cinderclient: Add service node cleanup command
    https://review.openstack.org/304236

Cinderclient: Add Backup to cleanable resource types
    https://review.openstack.org/304237

Cinderclient: Add service node auto-cleanup
    https://review.openstack.org/304238

*Merged*
----------------------------------------------------

Addressed by: https://review.openstack.org/303018
    Add workers table

Addressed by: https://review.openstack.org/303019
    Add worker's DB operations

*Abandoned*
----------------------------------------------------
Addressed by: https://review.openstack.org/303022
    Allow triggering cleanup from API

Addressed by: https://review.openstack.org/303023
    Make c-bak use workers table for cleanup

**TOOZ LOCKS**
==============

*Merged:*
-----------------------

Addressed by: https://review.openstack.org/183537
    Tooz locks

Addressed by: https://review.openstack.org/263313
    Start/Stop coordinator with Services

Addressed by: https://review.openstack.org/185646
    Replace locks in volume manager

Addressed by: https://review.openstack.org/270240
    Replace locks in remotefs backend driver

Addressed by: https://review.openstack.org/#/c/331325
    Fix lock files littering working dir during tests
    (fixes issue with Replace locks in volume manager)

*Ready for review:*
-----------------------
None

*Work in progress:*
-----------------------

Addressed by: https://review.openstack.org/246352
    Add convenience lock methods to objects

Addressed by: https://review.openstack.org/333489
    [WIP] Clean up RemoteFSSnapDriver

--------------------------------------------------------------------------------------------

Gerrit topic: https://review.openstack.org/#q,topic:bug/1493476,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bug/1238093,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bug/1469659,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bug/1526350,n,z
Gerrit topic: https://review.openstack.org/#q,topic:fix/api-races-simplified,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bp/cinder-volume-active-active-ha,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bp/coordinator-start-stop,n,z
Gerrit topic: https://review.openstack.org/#q,topic:bp/volume-manager-locks,n,z
Gerrit topic: https://review.openstack.org/#q,topic:ha/cleanup,n,z

Addressed by: https://review.openstack.org/335029
    Improve cinder-manage arg parsing

Addressed by: https://review.openstack.org/315541
    Refactor create, save, and destroy OVO methods

Addressed by: https://review.openstack.org/335138
    Fix CinderPersistentObject.refresh

Addressed by: https://review.openstack.org/335139
    Prevent doc generation failure on OVO decorators

Addressed by: https://review.openstack.org/344223
    Update OVO instance on destroy method call

Addressed by: https://review.openstack.org/344224
    Use original volume OVO instance in create flow

Gerrit topic: https://review.openstack.org/#q,topic:bug/1606698,n,z

Addressed by: https://review.openstack.org/348020
    Quobyte volume driver should use DLM

Gerrit topic: https://review.openstack.org/#q,topic:bug/1607059,n,z

Addressed by: https://review.openstack.org/348034
    Scality volume driver should use DLM

Gerrit topic: https://review.openstack.org/#q,topic:bug/1607074,n,z

Addressed by: https://review.openstack.org/348044
    Smbfs volume driver should use DLM

Addressed by: https://review.openstack.org/353069
    [PoC][Don't review] Testing delay

Addressed by: https://review.openstack.org/393460
    Prevent Active-Active on drivers by default

Addressed by: https://review.openstack.org/412534
    Prevent claiming and updating races on worker

Addressed by: https://review.openstack.org/413200
    Make notify_service_capabilities cluster aware

Gerrit topic: https://review.openstack.org/#q,topic:ha-aa/cleanup,n,z

Gerrit topic: https://review.openstack.org/#q,topic:transfer_locks,n,z

Addressed by: https://review.openstack.org/424360
    Lock transfers table on multi-table update

Addressed by: https://review.openstack.org/459153
    Add group to cluser when init host

(?)

Work Items

Work items:
Remove API Races: INPROGRESS
Manager Local Locks: INPROGRESS
Job distribution: INPROGRESS
Cleanup: TODO
Data Corruption Prevention: TODO
Drivers' Locks: INPROGRESS