A follow-up fix for buffer pool mutex split patch might be suboptimal, commit 1

Bug #1728509 reported by Laurynas Biveinis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Invalid
Undecided
Unassigned
5.6
Invalid
Undecided
Unassigned
5.7
Invalid
Undecided
Unassigned
8.0
Fix Committed
Wishlist
Unassigned

Bug Description

A copy of https://bugs.mysql.com/bug.php?id=85205:

[27 Feb 13:09] Laurynas Biveinis
Description:
One of the follow-up commits to the buffer pool mutex split patch (bug 75534) at is [1]. It is not entirely clear what problem is it fixing and whether it's done in an optimal way.

The commit message describes a race condition between one thread fixing a dirty block and another flushing and freeing it. This is indeed a scenario bug 75534 patch must handle. This commit addresses it by adding buffer pool block mutex protection to many, but not all, block->page.buf_fix_count writes. We haven't encountered this particular bug, and the commit message is vague enough, so I'll assume that it was necessary to serialize between 1) all the functions calling buf_fix_inc that this patch added block mutex protection for, and 2) buf_LRU_free_page doing the page eviction. Thus, I'll assume that dirty -> clean transition by flushing is not an issue, only eviction is. Given this:

- Why there is need to add protection to buf_fix_dec calls? These do not serialize with eviction in any way, do they?
- It looks like there is no need to add protection to buf_page_get_zip, because it is already serialized with buf_LRU_free_page by the page hash latch.
- Likewise for buf_page_try_get_func.
- The commit introduces a new block mutex critical section around buf_page_get_io_fix_unlocked call in buf_page_get_gen. I/O fix is not buffer pool fix, the commit message does not describe this change. Why is it necessary? If it is indeed necessary, then there is no need to use _unlocked variant of the function, buf_page_get_io_fix should be called instead.
- If the above is correct, then the commit reduces to a single correct change, in a UNIV_IBUF_DEBUG section of buf_page_get_gen

Again, it's hard to tell without knowing more details about the original bug you fixed, at least the stacktraces.

[1]:

commit 65649a16ce9412f7d9286e7e6ad6c2de5769be25
Author: Shaohua Wang <email address hidden>
Date: Thu Apr 21 13:34:00 2016 +0800

    BUG#23067038 ASSERTION FAILURE: BUF0BUF.CC:2861:BUF_PAGE_IN_FILE(BPAGE)
                 LEADS TO CORRUPT DATA

    It's a regression of wl#8423 InnoDB: Split the buffer pool mutex,a
    in which we removed block mutex protection for buf_fix_count.

    The assertion happens when one thread fixed a dirty block but the
    other thread flushed the block and moved the page from LRU list to
    free list, because it saw buf_fix_count is 0, other than 1.

    The solution is holding block mutex when buf fix and unfix.

    Reviewed-by: Debarun Banerjee <email address hidden>
    RB: 12456

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
tags: added: performance regression upstream
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-3754

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.