Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Possible inconsistency between InnoDB and .frm

Bug #803556 reported by Vadim Tkachenko on 2011-06-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraBackup moved to https://jira.percona.com/projects/PXB	Fix Released	Wishlist	Unassigned

Bug Description

This is more blueprint than bug, and we may keep it with low priority.

When we have a lot of tables ( by a lot I mean 100.000 or more),
copying .frm files may take significant time.

If we run with --no-lock options, then we have possibility to run CREATE/DROP/MODIFY/ALTER table
during copying .frm, and we may end up with backup that has inconsistency between .frm files and
innodb data dictionary.

It would be good to have procedure to sync .frm using xtrabackup_logfile, but it may be tricky
to re-create .frm from log file.

Valentine Gostev (longbow) on 2011-07-25

Changed in percona-xtrabackup:
importance:	Undecided → Wishlist

Revision history for this message

Ben Hencke (brainstar) wrote on 2011-10-31:

I also have a similar problem, with thousands of databases with almost a hundred tables each. In my case I can be sure no .frm changes during backup, but I still can't use --no-lock because I need binlog information to a) do point in time restore, and b) spawn slave instances.

The innobackupex script seems to spawn a cp processes for each (backing up to an nfs mount). So in addition to the time it takes to copy the files, it is spawning processes as well. There is a small optimization for scp when you use remote option, but only groups per db. The tar option also seems to spawn a tar process for each file.

It can take tens of minutes to finish, meanwhile the entire mysql is read locked.

I also discovered that the mysql slow log will not log any queries that have lock time due to the 'flush tables with read lock' unless slow log time is 0. It was quite a mystery to have a client show waiting for several minutes for a query and nothing show in the slow log.

Revision history for this message

Ben Hencke (brainstar) wrote on 2011-11-02:

Partial sollution, doesn't address DROP or RENAME DDL Edit (5.3 KiB, text/plain)

I'm attaching a patch made from xtrabackup-1.6.3 (innobackupex reports v1.5.1-xtrabackup). This adds a --rsync option, which reduces the lock time significantly, but still keeps

The script does an initial pre-lock rsync of the various db files, then during lock rsync any changes that were flushed. The 2nd rsync should run much faster since it would only need to copy changed files, and rsync is very efficient when comparing thousands of files.

It still uses all of the same logic it would have to run the copy, but instead writes the list of files to a temp file. Then calls rsync once and so should also avoid spawning thousands of processes, and still be compatible with all of the various single table/db options.

LIMITATIONS: I haven't made it work with --remote-host or --stream options. It will not remove any files that existed in the first copy, and were then deleted by the time the tables were locked ie DROP or RENAME. It would be possible to diff the first pass file list with the second, and remove the appropriate files, but I haven't implemented that.

To test, a created 10,000 dummy tables, then ran a backup with different settings, dropping the filesystem cache before each.

Using the unmodified version, the copy took 42 seconds. Total lock time was 54 seconds (from acquire to release).

Using the --rsync version, the first rsync took 21 seconds (no lock), and the second (lock) took 2 seconds. Total lock time was 15 seconds.

Revision history for this message

Ben Hencke (brainstar) wrote on 2011-11-04:

Partial sollution, doesn't address DROP or RENAME DDL (fix for keepalive sigpipe) Edit (5.4 KiB, text/plain)

Please ignore previous patch, there was a bug that showed up with larger databases (took longer than keepalive timer) and was trying to keepalive the not yet opened mysql connection.

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2011-11-10:

I created https://blueprints.launchpad.net/percona-xtrabackup/+spec/rsync-for-non-innodb-files to merge the contributed patch.