Parallel doublewrite buffer

Registered by Laurynas Biveinis on 2016-02-25

Implement parallel doublewrite buffer.

The existing doublewrite buffer is shared between all the buffer pool instances and all the flusher threads. PMP shows that it's becoming a point of contention with flusher threads waiting for the running batch to end so that they can post a page to flush. This limits the effect of extra cleaner threads. Moreover, single page flushes wait for the above and also contend on the doublewrite mutex.

Fix this by introducing private doublewrite buffers for each buffer pool instance, for each batch flushing mode (LRU or flush list). For example, with four buffer pool instances, there will eight doublewrite shards. Only one flusher thread can access any shard at a time, and each shard is added to and flushed completely independently from the rest. This does away with the mutex and the
event wait does not block other threads from proceeding anymore, it only waits for the async I/O to complete. The only inter-thread synchronization is between the flusher thread and I/O completion threads.

The new doublewrite is contained in a new file, where all the shards are contained, at different offsets. This file is created on startup, and removed on a clean shutdown. If it's found on a crashed instance startup, its contents are read any torn pages are restored. If it's found on a clean instance startup, the server startup is aborted with an error message.

The location of the doublewrite file is governed by a new innodb_parallel_doublewrite_path global, read-only system variable. It defaults to xb_doublewrite in the data directory. The variable accepts both absolute and relative paths. In the latter case they are treated as relative to the data directory. The doublewrite file is not a tablespace from InnoDB internals point of view.

The legacy InnoDB doublewrite buffer in the system tablespace continues to address doublewrite needs of single page flushes, and they are free to use the whole of that buffer (128 pages by default) instead of the last eight pages as currently. Note that single page flushes will not happen in Percona Server unless innodb_empty_free_list_algorithm is set to "legacy" value.

The existing transaction system header in the system tablespace and the existing doublewrite buffer are not touched in any way. Thus perfect cross-grade compatibility is ensured on clean shutdowns.

Interaction with innodb_flush_method. Regardless of its setting, the parallel doublewrite file is opened with O_DIRECT flag to remove OS caching. Then its access is further governed by the innodb_flush_method setting: if it's set to O_DSYNC, the parallel doublewrite is opened with O_SYNC flag too. Further, if it's one of O_DSYNC, O_DIRECT_NO_FSYNC, or ALL_O_DIRECT, then the doublewrite file is not flushed after a batch of writes to it is completed.

Upstream bugs for the "fixed in Percona Server list":

Blueprint information

Laurynas Biveinis
Laurynas Biveinis
Laurynas Biveinis
Series goal:
Accepted for 5.7
Milestone target:
milestone icon 5.7.11-4
Started by
Laurynas Biveinis on 2016-02-25
Completed by
Laurynas Biveinis on 2016-03-08



Work Items

This blueprint contains Public information 
Everyone can see this information.