Feature request: InnoDB changes tracking for incremental backup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Invalid
|
Wishlist
|
Laurynas Biveinis |
Bug Description
This is proposal from Peter
Current incremental backups are pain for large databases because they require complete scan. The idea is to add the feature which will be able
to track changes in the database and only copy data if it was changed. To maintain this server need to be modified to have an option to maintain
the log of pages changed enabled by option innodb_
Innodb when will create series of log file ib_modified_
or reaching certain size (for example 1GB) (in the future we might add feature to rotate them)
The log file will contain records containing TIMESTAMP, LSN_FROM, LST_TO <LIST OF PAGES+TABLESPACES FLUSHED>. Each block should have length and
checksum in the start of the block so if partial block have been written during the crash it is detected.
When MySQL is to about to write series of pages to the disk (ie when they are picked for double write buffer) we store list of pages updated and
LSN number and fsync() before pages are written to their appropriate locations on disk.
We store both LSN_FROM and LSN_TO as checkpoint LSN to be able to catch the case if data was corrupted in some way - for example if we temporary disabled
this functionality by mistake and when enabled this back we'll have the gap in the ranges (the next LSN_FROM will not match LSN_TO in previous record) this
means the log will be unusable.
Integration With Xtrabackup:
Xtrabackup will have the option to read this set of log file. It will check the first record in each log file to understand from which log file it should start and when will
identify the last LSN_FROM which is smaller than supplied argument. When it will scan the log files to build the list of pages which need to be copied, sorting it by
tablespace number. Many pages will be seen multiple times in the log file but they still need to be copied only once.
Xtrabackup will not need to enable or disable anything on server so multiple backup processes can continue to operate absolutely independently.
Size Calculation:
Assuming we're writing 100MB/sec of flushing, (over 8TB/day) which is 6400 pages per second. They are flushed in 100 page blocks (double write) in this case we'll need to write:
64*(8+4+16+100*8) = ~ 53KB/sec or about 4.5GB per day. It also contains about 1/200 of data written from buffer pool to the disk which I consider acceptable overhead.
If we consider more typical example for such case, 1TB database, about 10GB of data changed per day. 10G of changes will require some 60MB of tracking changes, which in
case Incremental backups are done as daily backup during a week will contain less than 500MB in total, which is 0.05% of total database size.
tags: | added: fr |
Changed in percona-server: | |
status: | Confirmed → Triaged |
Changed in percona-server: | |
assignee: | nobody → Laurynas Biveinis (laurynas-biveinis) |
Would it be possible to implement it as an InnoDB table, so the
changes are part of the normal transactional process and don't require
extra fsyncs? Then instead of naming a file to write the changes to,
the command-line option would specify a tablename to insert rows into.