Compact backups support

Registered by Alexey Kopytov

It is possible to omit secondary index pages when backing up InnoDB
tables and then recreate secondary indexes later, thus making backups
more compact at the cost of more expensive prepare stage.

The goal of this blueprint is to introduce the new --compact switch to
both innobackupex and xtabackup binary. When that option is specified,
xtrabackup will mark the resulting backup as "compact" in
xtrabackup_checkpoints, and then skip secondary index pages while
copying InnoDB tables. To make prepare possible, it is necessary to also
store page translation maps for each InnoDB file as a part of backup
metadata.

On prepare xtrabackup will first scan data files and fix FIL_PAGE_*
offsets according to translation maps stored on backup and reset change
buffer entries. When doing the actual prepare for compact backups, it
will ignore all updates to secondary index pages and use the same
translation maps to translate page offsets in log records. After
applying the log, secondary indexes should be recreated using fast index
creation and update page offsets to index root nodes in the data
dictionary.

Limitations:

- only per-table .ibd files can be compacted, compacting the system
  tablespace will not be supported.

Issue #19179.

Blueprint information

Status:
Complete
Approver:
Alexey Kopytov
Priority:
High
Drafter:
Alexey Kopytov
Direction:
Approved
Assignee:
Alexey Kopytov
Definition:
Approved
Series goal:
Accepted for 2.1
Implementation:
Implemented
Milestone target:
milestone icon 2.1.0-alpha1
Started by
Alexey Kopytov
Completed by
Stewart Smith

Sprints

Whiteboard

Backup stage
============

When the --compact option is passed to either innobackupex or xtrabackup
at the backup stage, xtrabackup skips secondary index pages when copying
per-table spaces (i.e. .ibd files). It is therefore necessary to:

1) have a way to detect index pages
2) keep the clustered index pages

1) is achieved by checking the FIL_PAGE_TYPE field in the page header
and only processing pages which have the FIL_PAGE_INDEX value in that
field. The code comments for FIL_PAGE_TYPE say its value can only be
trusted for uncompressed index pages, though in reality it is
initialized to FIL_PAGE_INDEX for compressed index pages as well.

2) implementation is based on the fact that the root page of the
clustered index has a fixed offset within a per-table space (page
#3). Which means the ID of the clustered index can be read from that
page, so clustered index pages can be detected by checking the
PAGE_INDEX_ID field in the index page header.

It is also important to keep root index pages for secondary indexes, as
those contain inode pointers to file segments containing leaf and
non-leaf pages, which are necessary to correctly reclaim those pages,
i.e. mark them as free and reusable by InnoDB.

Ranges of skipped pages are written to per-tablespace files with the
".pmap" suffix appended to the tablespace file name
(e.g. "table.ibd.pmap"). The file format is a series of 2-value tuples,
with each value being a 4-byte page offset corresponding to the first
and the last endpoints of skipped ranges, respectively.

Prepare stage
=============

On the prepare stage the xtrabackup binary first checks if a compact
backup is being prepared by reading the new "compact" attribute from the
xtrabackup_checkpoints file. If its value is 1, all compacted
tablespaces are expanded before applying the log.

Expansion is implemented by reading skipped page ranges from the
".pmap" file corresponding to each tablespace, and copying the .ibd file
to a temporary one, while writing specially marked empty pages instead
of skipped (i.e. compacted) secondary index pages. Once the .ibd file is
fully expanded into a temporary one, the expanded file is renamed back
to the original compacted .ibd file.

Empty pages replacing skipped secondary index pages are marked with the
"COMPACTP" magic string at the FIL_PAGE_DATA offset. It is required to
ignore those pages on recovery, i.e. skip steps to validate, apply hashed
log records and change buffer entries.

Rebuilding indexes
==================

Rebuilding indexes after applying log records to the tablespace is
performed when the new --rebuild-indexes option is passed to xtrabackup
binary along with --prepare (or to innobackupex along with --apply-log).

It has to be an explicitly requested action in order to support
incremental backups. Since XtraBackup has no information when applying
an incremental backup to a compact full one on whether there will be
more incremental backups applied to it later or not, rebuilding indexes
should be explicitly requested by a user whenever a full backup with
some incremental backups merged is ready to be restored. Rebuilding
indexes unconditionally on every incremental backup merge is not an
option, since it is an expensive operation.

To rebuild indexes after starting up an InnoDB instance and recovering
from the log file, XtraBackup traverses the InnoDB data dictionary and
processing all tables stored in separate tablespaces. For every such
table, XtraBackup reads the list of secondary indexes from the data
dictionary, drops them, and then recreates with fast index creation. It
also discards all change buffer entries corresponding to recreated
indexes.

As a result, secondary indexes will get different IDs from those in the
original data (i.e. before the backup). This however, is not a problem
as long as rebuilding indexes is only performed before restoring the
backup, i.e. after merging the last incremental backup (or right after
preparing the full one, if no incremental backups are used).

Command line options
====================

To summarize new command line options and their usage, if only full
backups are used, the basic command lines for innobackupex are:

# Create a full compact backup
innobackupex --compact /data/backup
# Prepare a full compact backup
innobackupex --apply-log --rebuild-indexes /data/backup

Strictly speaking, the --rebuild-indexes option can be used with any
backup, not necessarily a compact one, to defragment secondary indexes.

Incremental backups:

# Create a full compact backup
innobackupex --compact /data/full
# Create an incremental backup
# (compacting incremental backups is not currently supported)
innobackupex --incremental /data/full /data/inc1
# Another incremental backup
innobackupex --incremental /data/inc1 /data/inc2

# Prepare the full backup, but do not rebuild indexes
innobackupex --apply-log --redo-only /data/full
# Merge incremental backups
innobackupex --apply-log --redo-only --incremental-dir=/data/inc1 /data/full
innobackupex --apply-log --redo-only --incremental-dir=/data/inc2 /data/full

# Rebuild indexes before restoring the merged backup
innobackupex --apply-log --rebuild-indexes /data/full

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.

Subscribers

No subscribers.