Allow backup to cloud storage (Amazon S3 and Swift)

Registered by George Ormond Lorch III

<working>

Blueprint information

Status:
Complete
Approver:
Alexey Kopytov
Priority:
Not
Drafter:
George Ormond Lorch III
Direction:
Needs approval
Assignee:
George Ormond Lorch III
Definition:
Superseded
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Alexey Kopytov

Related branches

Sprints

Whiteboard

I believe what we need to create is a stand alone utility that can take a streamed input, buffer it up and push it to the cloud, then document how to use the utility in combination with innobackupex --stream option.

-------------------------------------
xbs3 - [put | get] [options]

  put - reads data from stdin, breaks into individual blocks and pushes blocks to s3 as sequential numbered objects within the specified bucket and creates a manifest file in the bucket that contains metadata about block size, number of blocks, time of creation, etc. 'put' will use a double buffering scheme where one buffer is filled from stdin and another is written to s3. If the writing speed (after any throttling has been performed) is greater then the stdin/read speed, put will never need to make use of the queue feature described below and therefore never need to go to disk.

  get - pulls manifest data from s3, sequentially reads s3 objects/blocks and writes contiguous data to stdout.

  --s3accesskey : (put and get) Amazon S3 access key.

  --s3secretkey : (put and get) Amazon S3 secret key. Mutually exclusive with s3secretkeyfile, but one of the two must be specified.

  --s3secretkeyfile : (put and get) The name of a file that contains an exact secret key. Mutually exclusive with s3secretkey, but one of the two must be specified.

  --bucket : (put and get) Identifies the name of the bucket to interact with. For put, the bucket should be a unique name that does not yet exist.

  --tmpdir : (put and get) Specifies a location to use to queue up blocks to send to s3 or to store blocks when reading and writing to stdout. During put, this must have enough storage capacity to handle queuedepth option.

  --region : (put only) Specifies region to create bucket in when performing put. (default="US Standard")

  --blocksize : (put only) Size of block (s3 object) to use in KB when performing put. (default=64)

  --queuedepth : Maximum depth of queue (in blocks) to queue up on disk when performing put. When the queue is full, writing to the stdin will block. Specifying 0 will disable queue blocking and requires that tmpdir have enough space to potential store the entire backup size in the case that the outgoing connection to s3 is slow. Specifying 1 will have the effect of disabling queueing altogether and will force stdin to block if the incoming buffer is full and the outgoing buffer is still writing, making the xbs3 utility a synchronous, blocking write target during backups and will limit the backup rate to whatever the best possible outgoing s3 write rate is. Specifying anything greater than 1 will set the actual number of blocks to allow to queue up before stdin will block. (default=0)

--------------------------------------------------
Potential features to add in later version:
  minrate : Allows user to specify minimum rate failure detection if read/write to s3 becomes unacceptably slow.
    --minrateblocks : (put and get) Number of consecutive blocks to permit to be transmitted below minrate. Specifying 0 disables the minrate feature. (default=0)
    --minrate : (put and get) Minimum transfer rate in KB/sec. If the data transfer rate is below this rate for minrateblocks consecutive blocks, xbs3 will terminate with an error. On push, this is only enforced when there is enough data to push to s3 to even perform the calculation, any rate drop due to waiting on data coming in will be ignored. Specifying 0 disables the minrate feature. (default=0)

  maxrate : Allows user to throttle rate transfer to prevent xbs3 from starving other processes of network i/o.
    --maxrate : (put and get) Maximum transfer rate in KB/sec to allow.

  parallel : Allows parallel transmission of s3 objects/blocks if bandwidth is sufficient or network latency is high enough that parallel may allow greater overall throughput by utilizing more of the available bandwidth.
    --parallel : (put and get) Specifies the number of threads/parallel blocks that can be transmitted at once.

----------------------------------------------------
Need input on language/libs, current implementation options are:
  - Build utility in C:
    - roll our own comm layer for REST/SOAP interface.
    - libs3 : widely used and considered the standard. License is GPL v3. Since this utility would be a free standing tool with no actual linkage to any MySQL/InnoDB/Percona Server/XtraDB components, it _should_ be OK, otherwise we would have a possible GPL v2->v3 license issue.
    - aws4c - experimental/garage built and unknown usage. LGPL
    - libAWS - experimental/garage built and unknown usage. Apache 2.0
  - Build utility in Perl/Python or other language with native bindings/packages provided by Amazon.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.