Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Bug #1180802
Comment #5

Comment 5 for bug 1180802

Revision history for this message

Jay Janssen (jay-janssen) wrote on 2013-05-16: Re: [Bug 1180802] wsrep_recover position unexpectedly gets RBR error

On May 16, 2013, at 12:49 PM, Raghavendra D Prabhu <email address hidden> wrote:

> Interesting.
>
> a) Has it been always 1032 or has it been 1062 (Duplicate entry)
> anytime?
>

When I saw it it was only the Delete Event.

> b) What is the sysbench that is running (since there is a Delete_rows
> event). Also, I presume, from description, that sysbench is running
> while you SIGKILL the other node's mysqld?

some version of sysbench 0.5 from lefred (search for 'lefred sysbench rpm').

sysbench --test=sysbench_tests/db/common.lua \
--mysql-host=node1 --mysql-user=test --mysql-db=test --oltp-table-size=250000 \
prepare

sysbench --test=sysbench_tests/db/oltp.lua --mysql-host=node1 --mysql-user=test --mysql-db=test --oltp-table-size=250000 --report-interval=1 --max-requests=0 --tx-rate=10 run

>
> c) Now, from ecec24ed-ab72-11e2-0800-008b24f29e8e:194044 and the error,
> it looks like it is failing at the very next event.
>
> Now, this means, when it got killed, only the events upto a seqno lower
> than 194044 actually made to InnoDB whereas to the log it was recorded
> as 194044.

Ok, so I can reproduce it the problem (more reliably than I'm comfortable with):

130516 12:28:01 InnoDB: Waiting for the background threads to start
130516 12:28:02 Percona XtraDB (http://www.percona.com) 5.5.30-rel30.2 started; log sequence number 885672889
130516 12:28:02 [Note] Event Scheduler: Loaded 0 events
130516 12:28:02 [Note] WSREP: Signalling provider to continue.
130516 12:28:02 [Note] WSREP: SST received: ecec24ed-ab72-11e2-0800-008b24f29e8e:281000
130516 12:28:02 [Note] WSREP: Receiving IST: 183 writesets, seqnos 281000-281183
130516 12:28:02 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.30' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_23.7.4.r3843
130516 12:28:02 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 750, Error_code: 1032
130516 12:28:02 [Warning] WSREP: RBR event 4 Delete_rows apply warning: 120, 281001
130516 12:28:02 [ERROR] WSREP: receiving IST failed, node restart required: Failed to apply app buffer: seqno: 281001, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130516 12:28:02 [Note] WSREP: Closing send monitor...
130516 12:28:02 [Note] WSREP: Closed send monitor.
130516 12:28:02 [Note] WSREP: gcomm: terminating thread
130516 12:28:02 [Note] WSREP: gcomm: joining thread
130516 12:28:02 [Note] WSREP: gcomm: closing backend
130516 12:28:03 [Note] WSREP: view(view_id(NON_PRIM,0807cf68-b18d-11e2-0800-0a143587da15,15) memb {
        917fa1cd-be45-11e2-0800-a5f5946af022,

Seems like we fail on 281001 again here. If I hack the grastate to start at 281002, it does indeed restart right.

>
> What is the value of innodb-flush-log-at-trx-commit.

Default. Entire config:

[root@node3 mysql]# cat /etc/my.cnf
[mysqld]
datadir = /var/lib/mysql
log_error = error.log

binlog_format = ROW

innodb_log_file_size = 64M

wsrep_cluster_name = mycluster
wsrep_cluster_address = gcomm://192.168.70.2,192.168.70.3,192.168.70.4
wsrep_node_name = node3
wsrep_node_address = 192.168.70.4

wsrep_provider = /usr/lib/libgalera_smm.so

wsrep_sst_method = xtrabackup
wsrep_sst_auth = sst:secret

wsrep_slave_threads = 2

innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode = 2

[mysql]
prompt = "node3 mysql> "

[client]
user = root

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen

On May 16, 2013, at 12:49 PM, Raghavendra D Prabhu <raghavendra.prabhu@percona.com> wrote:

> Interesting.
> 
> a) Has it been always 1032  or has it been 1062 (Duplicate entry)
> anytime?
>

When I saw it it was only the Delete Event.

> b) What is the sysbench that is running (since there is a Delete_rows
> event). Also, I presume, from description, that sysbench is running
> while you SIGKILL the other node's mysqld?

some version of sysbench 0.5 from lefred (search for 'lefred sysbench rpm').

sysbench --test=sysbench_tests/db/common.lua \
        --mysql-host=node1 --mysql-user=test --mysql-db=test --oltp-table-size=250000 \
        prepare

sysbench --test=sysbench_tests/db/oltp.lua --mysql-host=node1 --mysql-user=test --mysql-db=test --oltp-table-size=250000 --report-interval=1 --max-requests=0 --tx-rate=10 run

> 
> c) Now, from ecec24ed-ab72-11e2-0800-008b24f29e8e:194044 and the error,
> it looks like it is failing at the very next event.
> 
> Now, this means, when it got killed, only the events upto a seqno lower
> than 194044 actually made to InnoDB whereas to the log it was recorded
> as 194044.

Ok, so I can reproduce it the problem (more reliably than I'm comfortable with):

130516 12:28:01  InnoDB: Waiting for the background threads to start
130516 12:28:02 Percona XtraDB (http://www.percona.com) 5.5.30-rel30.2 started; log sequence number 885672889
130516 12:28:02 [Note] Event Scheduler: Loaded 0 events
130516 12:28:02 [Note] WSREP: Signalling provider to continue.
130516 12:28:02 [Note] WSREP: SST received: ecec24ed-ab72-11e2-0800-008b24f29e8e:281000
130516 12:28:02 [Note] WSREP: Receiving IST: 183 writesets, seqnos 281000-281183
130516 12:28:02 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.30'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Percona XtraDB Cluster (GPL), wsrep_23.7.4.r3843
130516 12:28:02 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.sbtest1; Can't find record in 'sbtest1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 750, Error_code: 1032
130516 12:28:02 [Warning] WSREP: RBR event 4 Delete_rows apply warning: 120, 281001
130516 12:28:02 [ERROR] WSREP: receiving IST failed, node restart required: Failed to apply app buffer: seqno: 281001, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130516 12:28:02 [Note] WSREP: Closing send monitor...
130516 12:28:02 [Note] WSREP: Closed send monitor.
130516 12:28:02 [Note] WSREP: gcomm: terminating thread
130516 12:28:02 [Note] WSREP: gcomm: joining thread
130516 12:28:02 [Note] WSREP: gcomm: closing backend
130516 12:28:03 [Note] WSREP: view(view_id(NON_PRIM,0807cf68-b18d-11e2-0800-0a143587da15,15) memb {
        917fa1cd-be45-11e2-0800-a5f5946af022,

Seems like we fail on 281001 again here.  If I hack the grastate to start at 281002, it does indeed restart right.

> 
> What is the value of innodb-flush-log-at-trx-commit.

Default.  Entire config:

[root@node3 mysql]# cat /etc/my.cnf
[mysqld]
datadir                         = /var/lib/mysql
log_error                       = error.log

binlog_format                   = ROW

innodb_log_file_size            = 64M

wsrep_cluster_name              = mycluster
wsrep_cluster_address           = gcomm://192.168.70.2,192.168.70.3,192.168.70.4
wsrep_node_name                 = node3
wsrep_node_address              = 192.168.70.4

wsrep_provider                  = /usr/lib/libgalera_smm.so

wsrep_sst_method                = xtrabackup
wsrep_sst_auth                  = sst:secret

wsrep_slave_threads             = 2

innodb_locks_unsafe_for_binlog  = 1
innodb_autoinc_lock_mode        = 2

[mysql]
prompt                          = "node3 mysql> "

[client]
user                            = root

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen