Possible inconsistency between InnoDB and .frm

Bug #803556 reported by Vadim Tkachenko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Fix Released
Wishlist
Unassigned

Bug Description

This is more blueprint than bug, and we may keep it with low priority.

When we have a lot of tables ( by a lot I mean 100.000 or more),
copying .frm files may take significant time.

If we run with --no-lock options, then we have possibility to run CREATE/DROP/MODIFY/ALTER table
during copying .frm, and we may end up with backup that has inconsistency between .frm files and
innodb data dictionary.

It would be good to have procedure to sync .frm using xtrabackup_logfile, but it may be tricky
to re-create .frm from log file.

Changed in percona-xtrabackup:
importance: Undecided → Wishlist
Revision history for this message
Ben Hencke (brainstar) wrote :

I also have a similar problem, with thousands of databases with almost a hundred tables each. In my case I can be sure no .frm changes during backup, but I still can't use --no-lock because I need binlog information to a) do point in time restore, and b) spawn slave instances.

The innobackupex script seems to spawn a cp processes for each (backing up to an nfs mount). So in addition to the time it takes to copy the files, it is spawning processes as well. There is a small optimization for scp when you use remote option, but only groups per db. The tar option also seems to spawn a tar process for each file.

It can take tens of minutes to finish, meanwhile the entire mysql is read locked.

I also discovered that the mysql slow log will not log any queries that have lock time due to the 'flush tables with read lock' unless slow log time is 0. It was quite a mystery to have a client show waiting for several minutes for a query and nothing show in the slow log.

Revision history for this message
Ben Hencke (brainstar) wrote :

I'm attaching a patch made from xtrabackup-1.6.3 (innobackupex reports v1.5.1-xtrabackup). This adds a --rsync option, which reduces the lock time significantly, but still keeps

The script does an initial pre-lock rsync of the various db files, then during lock rsync any changes that were flushed. The 2nd rsync should run much faster since it would only need to copy changed files, and rsync is very efficient when comparing thousands of files.

It still uses all of the same logic it would have to run the copy, but instead writes the list of files to a temp file. Then calls rsync once and so should also avoid spawning thousands of processes, and still be compatible with all of the various single table/db options.

LIMITATIONS: I haven't made it work with --remote-host or --stream options. It will not remove any files that existed in the first copy, and were then deleted by the time the tables were locked ie DROP or RENAME. It would be possible to diff the first pass file list with the second, and remove the appropriate files, but I haven't implemented that.

To test, a created 10,000 dummy tables, then ran a backup with different settings, dropping the filesystem cache before each.

Using the unmodified version, the copy took 42 seconds. Total lock time was 54 seconds (from acquire to release).

Using the --rsync version, the first rsync took 21 seconds (no lock), and the second (lock) took 2 seconds. Total lock time was 15 seconds.

Revision history for this message
Ben Hencke (brainstar) wrote :

Please ignore previous patch, there was a bug that showed up with larger databases (took longer than keepalive timer) and was trying to keepalive the not yet opened mysql connection.

Revision history for this message
Alexey Kopytov (akopytov) wrote :
Revision history for this message
Stewart Smith (stewart) wrote :

--rsync was added in 1.6.4, so marking Fix Released.

Changed in percona-xtrabackup:
status: New → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-976

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.