Implement cassandra point in time recovery
Implement cassandra point in time recovery
Native Cassandra Point-in-Time Recovery Process
Restoring a Cassandra keyspace means restoring all the keyspace SSTable files as they existed in a point in time.
Cassandra does not provide a native restore utility, but does provide a restore procedure. For each node in the cluster:
0. Flush data.
1. Shut down Cassandra.
2. Clear all files in commitlog directory (path defined by the <CommitLogDirec
parameter in the cassandra.yaml file, by default /var/lib/
3. Ideally, logs will be flushed before Cassandra is shut down, as the commitlog
directory is a shared resource of all keyspaces, not just the one to be restored.
4. Removing all current contents of the active keyspace (all *.db files).
5. Copying contents of desired snapshot to active keyspace.
6. Only if restored snapshot is the latest one, and you want the latest backup, copy
contents of backup directory into active keyspace area on top of the restored
snapshot files.
Note that the process must be executed on all nodes in the cluster, otherwise nodes that did not get the restored data will “update” the restored nodes with the newer, bad data.
Trove and Point-in-time recovery
OpenStack DBaaS Trove is able to perform instance restoration (whole new instance, from scratch) from previously stored backup in remote storage (OpenStack Swift, Amazon AWS S3, etc). From administration/
Restore gives an ability to spin-up new instance from backup (as mentioned earlier), but the Recovery gives an ability to restore already running instance from backup. For the beginning Trove would be able to recover/restore running instance from full backup.
Trove core ReST API and Point-in-Time Recovery/Restore flow
ReST routes
HTTP method
Routes
POST
{tenant_
or
{tenant_
Request body
“recovery”: {
“instance”: UUID,
“backup”: UUID,
}
Response object
“recovery”: {
“id”: “instance_id”,
“name”: “instance_name”,
“status”: “BUILDING”,
“datastore”: “mysql”,
“recovered_
“point_
}
Trove taskmanager RPC API and Point-in-Time Recovery/Restore flow
RPC message
RPC method
Method parameters
do_instance_
instance_id
backup_id
RPC message type
CAST with poll until instance reach ACTIVE status.
Trove guestagent RPC API and Point-in-Time Recovery/Restore flow
RPC message
RPC method
Method parameters
do_recovery
}
RPC message type
CAST
Method implementation
No new code. Re-used restore functionality.
Proposed implementation for Trove and for Python-troveclient
Trove: [1]
Python-troveclient: [2]
Useful links
[1] https:/
[2] https:/
Whiteboard
Please follow BP template if approval is needed. Thanks!
Work Items
Dependency tree
* Blueprints in grey have been implemented.