Implement cassandra point in time recovery

Registered by Denis M.

Implement cassandra point in time recovery

Native Cassandra Point-in-Time Recovery Process

Restoring a Cassandra keyspace means restoring all the keyspace SSTable files as they existed in a point in time.

Cassandra does not provide a native restore utility, but does provide a restore procedure. For each node in the cluster:

0. Flush data.
1. Shut down Cassandra.
2. Clear all files in commitlog directory (path defined by the <CommitLogDirectory>
    parameter in the cassandra.yaml file, by default /var/lib/cassandra/commitlog).
3. Ideally, logs will be flushed before Cassandra is shut down, as the commitlog
    directory is a shared resource of all keyspaces, not just the one to be restored.
4. Removing all current contents of the active keyspace (all *.db files).
5. Copying contents of desired snapshot to active keyspace.
6. Only if restored snapshot is the latest one, and you want the latest backup, copy
    contents of backup directory into active keyspace area on top of the restored
    snapshot files.

Note that the process must be executed on all nodes in the cluster, otherwise nodes that did not get the restored data will “update” the restored nodes with the newer, bad data.

Trove and Point-in-time recovery

OpenStack DBaaS Trove is able to perform instance restoration (whole new instance, from scratch) from previously stored backup in remote storage (OpenStack Swift, Amazon AWS S3, etc). From administration/regular user perspective Trove should be able to perform point in time recovery. Basically it’s almost the same as restoring new instance, but the difference between restore (in terms of Trove) and recovery is huge.
Restore gives an ability to spin-up new instance from backup (as mentioned earlier), but the Recovery gives an ability to restore already running instance from backup. For the beginning Trove would be able to recover/restore running instance from full backup.
Trove core ReST API and Point-in-Time Recovery/Restore flow

ReST routes

HTTP method
Routes

POST
{tenant_id}/instances/{instance_id}/recover

or

{tenant_id}/instances/{instance_id}/restore
Request body

“recovery”: {
    “instance”: UUID,
    “backup”: UUID,
}
Response object

“recovery”: {
    “id”: “instance_id”,
    “name”: “instance_name”,
    “status”: “BUILDING”,
    “datastore”: “mysql”,
    “recovered_from_backup”: “backup_id”,
    “point_in_time”: “2011-01-22T13:25:27-06:00”,
}

Trove taskmanager RPC API and Point-in-Time Recovery/Restore flow

RPC message
RPC method
Method parameters
do_instance_recovery
instance_id
backup_id
RPC message type
    CAST with poll until instance reach ACTIVE status.
Trove guestagent RPC API and Point-in-Time Recovery/Restore flow

RPC message
RPC method
Method parameters

do_recovery
        backup_info: {
                      'id': backup_id,
                      ‘location': location,
                      ’type': backup_type,
                     'checksum': checksum,
        }
RPC message type
    CAST

Method implementation

No new code. Re-used restore functionality.

Proposed implementation for Trove and for Python-troveclient

Trove: [1]
Python-troveclient: [2]

Useful links
[1] https://review.openstack.org/#/c/77222/
[2] https://review.openstack.org/#/c/77223/

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
Denis M.
Direction:
Needs approval
Assignee:
Denis M.
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Denis M.

Related branches

Sprints

Whiteboard

Please follow BP template if approval is needed. Thanks!

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.