Parallel compression

Registered by Alexey Kopytov on 2011-06-02

This has already been implemented in
lp:~percona-dev/percona-xtrabackup/xtrabackup-parallel-compression.

Below is a summary from commit message:

  * InnoDB files are now streamed by the xtrabackup binary rather than
    innobackupex. As a result, integrity is now verified by xtrabackup and
    thus tar4ibd is no longer needed, so it was removed.

  * xtrabackup binary now accepts the new '--stream' option which has
    exactly the same semantics as the '--stream' option in
    innobackupex: it tells xtrabackup to stream all files to the standard
    output in the specified format rather than storing them locally.

  * The xtrabackup binary can now do parallel compression using the
    quicklz library. Two new options were added to xtrabackup to support
    this feature:

    - '--compress' tells xtrabackup to compress all output data, including
    the transaction log file and meta data files, using the specified
    compression algorithm. The only currently supported algorithm is
    'quicklz'. The resulting files have the qpress archive format,
    i.e. every *.qp file produced by xtrabackup is essentially a one-file
    qpress archive and can be extracted and uncompressed by the qpress
    file archiver (http://www.quicklz.com/).

    - '--compress-threads' specifies the number of worker threads used by
      xtrabackup for parallel data compression. This option defaults to 1.

    Parallel compression ('--compress-threads') can be used together with
    parallel file copying ('--parallel'). For example, '--parallel=4
    --compress --compress-threads=2' will create 4 IO threads that will
    read the data and pipe it to 2 compression threads.

    New algorithms (gzip, bzip2, etc.) may be added later with minor
    efforts.

  * To support simultaneous compression and streaming, a new custom
    streaming format called xbstream was introduced to XtraBackup in
    addition to the TAR format. That was required to overcome some
    limitations of traditional archive formats such as tar, cpio and
    others which did not allow streaming dynamically generated files, for
    example dynamically compressed files. Other advantages of xbstream
    over traditional streaming/archive format include ability to stream
    multiple files concurrently (so it is possible to use streaming in the
    xbstream format together with the --parallel option) and more compact
    data storage.

  * To allow streaming and extracting files to/from the xbstream format
    produced by xtrabackup, a new utility aptly called 'xbstream' was
    added to the XtraBackup distribution. This utility has a tar-like
    interface:

      - with the '-x' option it extracts files from the stream read from
        its standard input to the current directory unless specified
        otherwise with the '-C' option.

      - with the '-c' option it streams files specified on the command
        line to its standard output.

    The utility also tries to minimize its impact on the OS page cache by
    using the appropriate posix_fadvise() calls when available.

Blueprint information

Status:
Complete
Approver:
Stewart Smith
Priority:
High
Drafter:
None
Direction:
Approved
Assignee:
Alexey Kopytov
Definition:
Approved
Series goal:
Accepted for 2.0
Implementation:
Implemented
Milestone target:
milestone icon 1.9.1
Started by
Alexey Kopytov on 2011-07-03
Completed by
Alexey Kopytov on 2011-10-23

Sprints

Whiteboard

setting to "Beta Available" as hasn't been merged yet

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.