Write changed page bitmap in XtraDB
Operation and architecture overview
The InnoDB/XtraDB changed page tracking is done by a new thread (log0online.h, log0online.c) that reads and parses the (space; page) pairs out of the written log data. The tracking is controlled by a new read-only server variable --innodb-
The 'tracked_lsn' field contains an LSN up to which all the changes have been parsed. There is a maximum limit for the (current LSN - tracked LSN) value, violation of which will cause server operation to stop until the tracking catches up. This limit is equal to the maximum checkpoint age.
For better concurrency, the tracked_lsn field is not protected by the log_sys or any other mutex. It is accessed using the atomic operations primitives. InnoDB in 5.1 does not have the 64-bit primitives, thus they are backported from InnoDB of 5.6. For the platforms lacking the 64-bit atomics, provide a fallback implementation that protects the field by the log_sys mutex.
On the server startup, the log reader thread opens the last tracked bitmap file, truncates it to a multiple of bitmap block length and reads the last page to find out the last LSN tracked in that file. If the last page checksum check fails or it does not have the last page flag set, then the file is read backwards one page at a time until these two conditions are met. This LSN is then compared with the server start LSN. If they are non-equal, that means that there is a hole in the tracked LSN interval, i.e. due to a crash or srv_fast_shutdown=2 shutdown. In this case the hole is either closed by immediatelly reading and parsing the untracked log data or diagnosed if a part of the required logs is already overwritten. In this case the changed page bitmap data is usable only from the latest LSN. The log reader thread then goes to a loop of waiting for an srv_checkpoint_
The log-writing thread behaviour is adjusted as follows. First, to ensure that the maximum tracked LSN age limit holds, all pending writes check it and delay log write operations if necessary (log_reserve_
Upon the slow server shutdown the logs_empty_
to loop until the log reader thread completely catches up with the written log.
Whenever log reader thread wakes up, it reads and parses the log data as follows (log_online_
The in-memory changed page bitmap structure is the InnoDB red-black tree (ut0rb) of bitmap blocks. Each block is identified by the (space id, 1st page id in this block) pair, where 1st page id is only allowed to be a multiple of one bitmap block length. When the tree data is written to the disk, its nodes are recycled into a free list. They are never released back to heap in order to prevent heap fragmentation.
TODO: missing implementation items: 1) bitmap file rotate; 2) bitmap file rotate on user request; 3) INFORMATION_
Additional information in SHOW ENGINE INNODB STATUS
When log tracking is enabled, the following additional fields are displayed in the LOG section of the SHOW ENGINE INNODB STATUS output:
"Log tracked up to:" displays the LSN up to which all the changes have been parsed and stored as a bitmap on disk by the log tracking thread
"Max tracked LSN age:" displays the maximum limit on how far behind the log tracking thread may be.
File format
The changed page bitmap consists of 4K blocks that form variable-length runs. Each run has a complete tracking information for a certain LSN interval and each page has the following fields (format offset (width)):
- 0 (4): Last block flag. 1 if the current block is the last one in the current run, 0 otherwise.
- 4 (8): Starting tracked LSN of the current run. Equal for all blocks in the same run.
- 12 (8): Last tracked LSN of the current run. Equal for all blocks in the same run.
- 20 (4): Space ID of the tracked pages in the current block.
- 24 (4): Page ID of the first tracked page in the current block
- 28 (4): unused space to align the start of bitmap data at 8 bytes
- 32 (4056): the changed page bitmap.
- 4088 (4): unused space to align the end of bitmap data at 8 bytes.
- 4092 (4): the checksum of the current page.
The bitmap representation is a straightforward uncompressed bitmap: byte 0, bit 0 of the bitmap corresponds to page 0, bit 1 to page 1, byte 1, bit 0 to page 8, etc. A single page has 4056 bytes = 32448 bits of bitmap data. No bitmap compression currently is used. However, storing the page id of the 1st tracked page in the current block limits the sparseness of the bitmaps somewhat, especially if only pages with high ids are being changes.
XtraBackup consumption
https:/
Instead of iterating over all data files to check last page modification LSN > LSN of last full backup, read the bitmap data to find this same set of pages.
Original description in bug 742162 (note that current implementation has deviated):
This is proposal from Peter
Current incremental backups are pain for large databases because they require complete scan. The idea is to add the feature which will be able
to track changes in the database and only copy data if it was changed. To maintain this server need to be modified to have an option to maintain
the log of pages changed enabled by option innodb_
Innodb when will create series of log file ib_modified_
or reaching certain size (for example 1GB) (in the future we might add feature to rotate them)
The log file will contain records containing TIMESTAMP, LSN_FROM, LST_TO <LIST OF PAGES+TABLESPACES FLUSHED>. Each block should have length and
checksum in the start of the block so if partial block have been written during the crash it is detected.
When MySQL is to about to write series of pages to the disk (ie when they are picked for double write buffer) we store list of pages updated and
LSN number and fsync() before pages are written to their appropriate locations on disk.
We store both LSN_FROM and LSN_TO as checkpoint LSN to be able to catch the case if data was corrupted in some way - for example if we temporary disabled
this functionality by mistake and when enabled this back we'll have the gap in the ranges (the next LSN_FROM will not match LSN_TO in previous record) this
means the log will be unusable.
Integration With Xtrabackup:
Xtrabackup will have the option to read this set of log file. It will check the first record in each log file to understand from which log file it should start and when will
identify the last LSN_FROM which is smaller than supplied argument. When it will scan the log files to build the list of pages which need to be copied, sorting it by
tablespace number. Many pages will be seen multiple times in the log file but they still need to be copied only once.
Xtrabackup will not need to enable or disable anything on server so multiple backup processes can continue to operate absolutely independently.
Size Calculation:
Assuming we're writing 100MB/sec of flushing, (over 8TB/day) which is 6400 pages per second. They are flushed in 100 page blocks (double write) in this case we'll need to write:
64*(8+4+16+100*8) = ~ 53KB/sec or about 4.5GB per day. It also contains about 1/200 of data written from buffer pool to the disk which I consider acceptable overhead.
If we consider more typical example for such case, 1TB database, about 10GB of data changed per day. 10G of changes will require some 60MB of tracking changes, which in
case Incremental backups are done as daily backup during a week will contain less than 500MB in total, which is 0.05% of total database size.
Blueprint information
- Status:
- Complete
- Approver:
- Alexey Kopytov
- Priority:
- High
- Drafter:
- Laurynas Biveinis
- Direction:
- Approved
- Assignee:
- Laurynas Biveinis
- Definition:
- Approved
- Series goal:
- Accepted for 5.1
- Implementation:
-
Implemented
- Milestone target:
-
5.1.65-14.0
- Started by
- Laurynas Biveinis
- Completed by
- Alexey Kopytov
Related branches
Related bugs
Bug #742162: Feature request: InnoDB changes tracking for incremental backup | Invalid |
Bug #937859: InnoDB error in server log of innodb_bug34300 | Fix Released |
Sprints
Whiteboard
I am wondering if we should diagnose the maximum untracked log age violations as follows. Since these violations result in holes in an otherwise continuous tracked LSN range, save the start LSN of the current (last) uninterrupted tracking range. Include this value in the show InnoDB status output. On the maximum age violation save the last tracked LSN. When the tracking resumes again, print the hole interval to the error log. This way a DBA can diagnose when the bitmaps become partly unusable due to tracked LSN holes and can also verify the uninterruptedness by InnoDB status output.
Work Items
Work items:
[hrvojem] Documentation: TODO
Dependency tree

* Blueprints in grey have been implemented.