Bug #1594964 “percona 5.7.13-6-1 SLAVE SQL_THREAD memory leaks ...” : Bugs : Percona Server moved to https://jira.percona.com/projects/PS

Revision history for this message

Muayad (muayad-yo) wrote on 2016-06-21:

#1

Massif output Edit (127.9 KiB, text/plain)

summary:

- percona 5.7.11-4 SLAVE SQL_THREAD meory leaks on the active server
+ percona 5.7.11-4 SLAVE SQL_THREAD memory leaks on the active server
with Master-Master replication

Revision history for this message

Muayad (muayad-yo) wrote on 2016-06-21: Re: percona 5.7.11-4 SLAVE SQL_THREAD memory leaks on the active server with Master-Master replication

#2

globalvar.out Edit (16.8 KiB, text/plain)

Revision history for this message

Muayad (muayad-yo) wrote on 2016-08-11: Re: percona 5.7.11-4 SLAVE SQL_THREAD memory leaks on the active server with Master-Master replication when relay_log_info_repository=TABLE

#3

this happens when setting relay_log_info_repository=TABLE, there is no memory leak when switched to relay_log_info_repository=FILE

summary:

percona 5.7.11-4 SLAVE SQL_THREAD memory leaks on the active server
- with Master-Master replication
+ with Master-Master replication when relay_log_info_repository=TABLE

Muayad (muayad-yo) on 2016-08-16

summary:	- percona 5.7.11-4 SLAVE SQL_THREAD memory leaks on the active server + percona 5.7.13-6-1 SLAVE SQL_THREAD memory leaks on the active server with Master-Master replication when relay_log_info_repository=TABLE
description:	updated

Revision history for this message

Przemek (pmalkowski) wrote on 2017-02-23:

#4

Download full text (6.1 KiB)

I am able to reproduce the memory leak, on both MySQL Community 5.7 as well as Percona Server 5.7.
The important settings to make it reproducible are:
log-bin=mysql-bin
log-slave-updates
sync_relay_log_info=1
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

With stress on low sync_relay_log_info value. With default 5.7 value of 10000, the memory leak is way less visible, and with value of 1, it grows very fast.

Quick steps to reproduce using mysqlsandbox (memory instrument just to illustrate the effect):

make_replication_sandbox --master_master Percona-Server-5.7.17-11-Linux.x86_64.ssl100.tar.gz
cd rcsandbox_Percona-Server-5_7_17
for i in {1..2}; do echo "sync_relay_log_info=1" >> node$i/my.sandbox.cnf ; done
for i in {1..2}; do echo "performance-schema-instrument='memory%=ON' " >> node$i/my.sandbox.cnf ; done

./enable_gtid

node1 [localhost] {msandbox} ((none)) > create database sbtest1;
Query OK, 1 row affected (0.02 sec)

$ sysbench --num-threads=16 --max-requests=300000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-db=sbtest1 --mysql-user=msandbox --mysql-password=msandbox --oltp-table-size=1000000 --mysql-socket=/tmp/mysql_sandbox15701.sock --test=/usr/share/doc/sysbench/tests/db/oltp.lua prepare
sysbench 0.5: multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'

$ sysbench --num-threads=16 --max-requests=300000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-db=sbtest1 --mysql-user=msandbox --mysql-password=msandbox --oltp-table-size=1000000 --mysql-socket=/tmp/mysql_sandbox15701.sock --test=/usr/share/doc/sysbench/tests/db/oltp.lua run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
(...)
total number of events: 300124
(...)

$ ps aux|grep rcsandbox_Percona-Server-5_7_17|grep .sock|awk '{print "vsz: "$5/1024" MB","rss: "$6/1024" MB",$19}'
vsz: 2336.39 MB rss: 1966.38 MB --port=15701
vsz: 732.328 MB rss: 380.527 MB --port=15702

node1 [localhost] {msandbox} ((none)) > select * from sys.memory_global_total;
+-----------------+
| total_allocated |
+-----------------+
| 1.51 GiB |
+-----------------+
1 row in set (0.01 sec)

node1 [localhost] {msandbox} ((none)) > select event_name, high_number_of_bytes_used/1024/1024 high_number_of_bytes_used_MB from performance_schema.memory_summary_global_by_event_name order by high_number_of_bytes_used desc limit 10;
+----------------------------------------------------------------------+------------------------------+
| event_name | high_number_of_bytes_used_MB |
+----------------------------------------------------------------------+------------------------------+
| memory/sql/thd::main_mem_root | 1194.45265198 |
| memory/innodb/buf_buf_pool | 133.25000000 |
| memory/innodb/log0log | 32.08451843 |
| memory/sql/XID ...

I am able to reproduce the memory leak, on both MySQL Community 5.7 as well as Percona Server 5.7.
The important settings to make it reproducible are:
log-bin=mysql-bin
log-slave-updates
sync_relay_log_info=1
master-info-repository=table
relay-log-info-repository=table
gtid_mode=ON
enforce-gtid-consistency

With stress on low sync_relay_log_info value. With default 5.7 value of 10000, the memory leak is way less visible, and with value of 1, it grows very fast.

Quick steps to reproduce using mysqlsandbox (memory instrument just to illustrate the effect):

make_replication_sandbox --master_master Percona-Server-5.7.17-11-Linux.x86_64.ssl100.tar.gz
cd rcsandbox_Percona-Server-5_7_17
for i in {1..2}; do echo "sync_relay_log_info=1" >> node$i/my.sandbox.cnf ; done
for i in {1..2}; do echo "performance-schema-instrument='memory%=ON' " >> node$i/my.sandbox.cnf ; done

./enable_gtid

node1 [localhost] {msandbox} ((none)) > create database sbtest1;
Query OK, 1 row affected (0.02 sec)

$ sysbench --num-threads=16 --max-requests=300000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-db=sbtest1 --mysql-user=msandbox --mysql-password=msandbox --oltp-table-size=1000000 --mysql-socket=/tmp/mysql_sandbox15701.sock --test=/usr/share/doc/sysbench/tests/db/oltp.lua prepare 
sysbench 0.5:  multi-threaded system evaluation benchmark

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'

$ sysbench --num-threads=16 --max-requests=300000 --db-driver=mysql --test=oltp --mysql-table-engine=InnoDB --mysql-db=sbtest1 --mysql-user=msandbox --mysql-password=msandbox --oltp-table-size=1000000 --mysql-socket=/tmp/mysql_sandbox15701.sock --test=/usr/share/doc/sysbench/tests/db/oltp.lua run     
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 16
(...)
    total number of events:              300124
(...)

$ ps aux|grep rcsandbox_Percona-Server-5_7_17|grep .sock|awk '{print "vsz: "$5/1024" MB","rss: "$6/1024" MB",$19}'
vsz: 2336.39 MB rss: 1966.38 MB --port=15701
vsz: 732.328 MB rss: 380.527 MB --port=15702

node1 [localhost] {msandbox} ((none)) > select * from sys.memory_global_total;
+-----------------+
| total_allocated |
+-----------------+
| 1.51 GiB        |
+-----------------+
1 row in set (0.01 sec)

node1 [localhost] {msandbox} ((none)) > select event_name, high_number_of_bytes_used/1024/1024 high_number_of_bytes_used_MB from performance_schema.memory_summary_global_by_event_name order by high_number_of_bytes_used desc limit 10;
+----------------------------------------------------------------------+------------------------------+
| event_name                                                           | high_number_of_bytes_used_MB |
+----------------------------------------------------------------------+------------------------------+
| memory/sql/thd::main_mem_root                                        |                1194.45265198 |
| memory/innodb/buf_buf_pool                                           |                 133.25000000 |
| memory/innodb/log0log                                                |                  32.08451843 |
| memory/sql/XID                                                       |                  19.00152588 |
| memory/performance_schema/events_statements_history_long             |                  13.65661621 |
| memory/innodb/fil0fil                                                |                  13.28554058 |
| memory/innodb/lock0lock                                              |                  11.08512115 |
| memory/performance_schema/events_statements_history_long.sqltext     |                   9.76562500 |
| memory/performance_schema/events_statements_history_long.tokens      |                   9.76562500 |
| memory/performance_schema/events_statements_summary_by_digest.tokens |                   9.76562500 |
+----------------------------------------------------------------------+------------------------------+
10 rows in set (0.01 sec)

node1 [localhost] {msandbox} ((none)) > select thread_id tid, user, current_count_used ccu, current_allocated ca, current_avg_alloc caa, current_max_alloc cma, total_allocated from sys.memory_by_thread_by_current_bytes limit 10;
+-----+---------------------------+--------+------------+-----------+------------+-----------------+
| tid | user                      | ccu    | ca         | caa       | cma        | total_allocated |
+-----+---------------------------+--------+------------+-----------+------------+-----------------+
|  27 | sql/slave_sql             | 530873 | 1.18 GiB   | 2.33 KiB  | 1.16 GiB   | 12.82 GiB       |
|   1 | sql/main                  |   3251 | 192.14 MiB | 60.52 KiB | 133.25 MiB | 224.46 MiB      |
|  52 | msandbox@localhost        |     51 | 718.49 KiB | 14.09 KiB | 256.00 KiB | 7.99 MiB        |
|  26 | sql/slave_io              |    379 | 108.16 KiB | 292 bytes | 32.01 KiB  | 7.08 MiB        |
|  31 | rsandbox@localhost        |     51 | 69.58 KiB  | 1.36 KiB  | 18.07 KiB  | 118.83 KiB      |
|  25 | innodb/dict_stats_thread  |     29 | 1.22 KiB   | 43 bytes  | 896 bytes  | 6.62 MiB        |
|  24 | innodb/buf_dump_thread    |      3 | 96 bytes   | 32 bytes  | 96 bytes   | 481.02 KiB      |
|   2 | sql/thread_timer_notifier |      0 | 0 bytes    | 0 bytes   | 0 bytes    | 0 bytes         |
|   3 | innodb/io_log_thread      |      0 | 0 bytes    | 0 bytes   | 0 bytes    | 0 bytes         |
|   4 | innodb/io_ibuf_thread     |      0 | 0 bytes    | 0 bytes   | 0 bytes    | 0 bytes         |
+-----+---------------------------+--------+------------+-----------+------------+-----------------+
10 rows in set (0.06 sec)

node1 [localhost] {msandbox} ((none)) > STOP SLAVE SQL_THREAD; START SLAVE SQL_THREAD;
Query OK, 0 rows affected (0.18 sec)

Query OK, 0 rows affected (0.01 sec)

$ ps aux|grep rcsandbox_Percona-Server-5_7_17|grep .sock|awk '{print "vsz: "$5/1024" MB","rss: "$6/1024" MB",$19}'
vsz: 2336.39 MB rss: 445.086 MB --port=15701
vsz: 732.328 MB rss: 380.527 MB --port=15702

node1 [localhost] {msandbox} ((none)) > select * from sys.memory_global_total;
+-----------------+
| total_allocated |
+-----------------+
| 360.34 MiB      |
+-----------------+
1 row in set (0.00 sec)

tags:

added: i169279

Revision history for this message

Przemek (pmalkowski) wrote on 2017-02-23:

#5

With sync_relay_log_info=10, after the same number of sysbench requests:

node1 [localhost] {msandbox} ((none)) > select * from sys.memory_global_total;
+-----------------+
| total_allocated |
+-----------------+
| 548.28 MiB |
+-----------------+
1 row in set (0.00 sec)

node1: vsz: 1044.39 MB rss: 623.004 MB --port=15701
node2: vsz: 756.328 MB rss: 384.484 MB --port=15702

Related upstream bug: https://bugs.mysql.com/bug.php?id=85034

Laurynas Biveinis (laurynas-biveinis) on 2017-02-23

tags:

added: upstream

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2017-05-30:

#6

It looks like the reason is the same as described here: https://bugs.launchpad.net/percona-server/+bug/1667307/comments/6

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2017-06-01:

#7

I was wrong about the reasons in my previous comment.

So, what is happening here:

1) In Rpl_info_table::do_init_info() the thd is gotten with the following code:
THD *thd= access->create_thd();

2) If we look into Rpl_info_table_access::create_thd() we will see that thd is created only in the case if the current_thd is NULL, otherwise the current_thd is returned. For our case that means the replication thread is used to store data in rpl info table.

THD *Rpl_info_table_access::create_thd()
{
THD *thd= current_thd;

  if (!thd)
  {
    thd= System_table_access::create_thd();
    thd->system_thread= SYSTEM_THREAD_INFO_REPOSITORY;
    thd_created= true;
  }

return(thd);
}

3) If wee look at the massif.out file we will see that new memory is allocated with "new" operator with parameters, and thd->mem_root is used as the parameter, so the memory is allocated in thd->mem_root arena.

->09.32% (578,806,592B) 0x188E91C: my_raw_malloc (in /usr/sbin/mysqld-debug)
| ->09.32% (578,806,592B) 0x188E519: my_malloc (in /usr/sbin/mysqld-debug)
| ->09.03% (561,046,608B) 0x1887A44: alloc_root (in /usr/sbin/mysqld-debug)
| | ->08.75% (543,172,776B) 0xEFC948: Sql_alloc::operator new(unsigned long, st_mem_root*) (in /usr/sbin/mysqld-debug)
| | | ->06.25% (388,424,352B) 0x153B929: LEX::new_query(st_select_lex*) (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x153BEEE: LEX::new_top_level_query() (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x153B5E2: lex_start(THD*) (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x1858CB9: Rpl_info_table_access::before_open(THD*) (in /usr/sbin/mysqld-debug)
| | | | | ->06.25% (388,424,352B) 0x181FA59: System_table_access::open_table(THD*, st_mysql_lex_string, st_mysql_lex_string, unsigned int, thr_lock_type, TABLE**, Open_tables_backup*) (in /usr/sbin/mysqld-debug)
| | | | | ->06.25% (388,424,352B) 0x186E3D3: Rpl_info_table::do_flush_info(bool) (in /usr/sbin/mysqld-debug)

4) If we look at the end of Rpl_info_table::do_init_info() function we will see the call of access->drop_thd(thd) which code is the following:

void Rpl_info_table_access::drop_thd(THD *thd)
{
DBUG_ENTER("Rpl_info::drop_thd");

  if (thd_created)
  {
    System_table_access::drop_thd(thd);
    thd_created= false;
  }

DBUG_VOID_RETURN;
}

For our case thd_created is false, and that is why thd is not deleted and the memory managed with thd->mem_root is not deallocated.

That is why if slave sql thread is restarted, it's thd is deleted, and, as a result thd->mem_root memory is deallocated, what can be considered as workaround for this bug.

I was wrong about the reasons in my previous comment.

So, what is happening here:

1) In Rpl_info_table::do_init_info() the thd is gotten with the following code:
  THD *thd= access->create_thd();

2) If we look into Rpl_info_table_access::create_thd() we will see that thd is created only in the case if the current_thd is NULL, otherwise the current_thd is returned. For our case that means the replication thread is used to store data in rpl info table.

THD *Rpl_info_table_access::create_thd()
{
  THD *thd= current_thd;

if (!thd)
  {
    thd= System_table_access::create_thd();
    thd->system_thread= SYSTEM_THREAD_INFO_REPOSITORY;
    thd_created= true;
  }

return(thd);
}

3) If wee look at the massif.out file we will see that new memory is allocated with "new" operator with parameters, and thd->mem_root is used as the parameter, so the memory is allocated in thd->mem_root arena.

->09.32% (578,806,592B) 0x188E91C: my_raw_malloc (in /usr/sbin/mysqld-debug)
| ->09.32% (578,806,592B) 0x188E519: my_malloc (in /usr/sbin/mysqld-debug)
| ->09.03% (561,046,608B) 0x1887A44: alloc_root (in /usr/sbin/mysqld-debug)
| | ->08.75% (543,172,776B) 0xEFC948: Sql_alloc::operator new(unsigned long, st_mem_root*) (in /usr/sbin/mysqld-debug)
| | | ->06.25% (388,424,352B) 0x153B929: LEX::new_query(st_select_lex*) (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x153BEEE: LEX::new_top_level_query() (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x153B5E2: lex_start(THD*) (in /usr/sbin/mysqld-debug)
| | | | ->06.25% (388,424,352B) 0x1858CB9: Rpl_info_table_access::before_open(THD*) (in /usr/sbin/mysqld-debug)
| | | | | ->06.25% (388,424,352B) 0x181FA59: System_table_access::open_table(THD*, st_mysql_lex_string, st_mysql_lex_string, unsigned int, thr_lock_type, TABLE**, Open_tables_backup*) (in /usr/sbin/mysqld-debug)
| | | | | ->06.25% (388,424,352B) 0x186E3D3: Rpl_info_table::do_flush_info(bool) (in /usr/sbin/mysqld-debug)

4) If we look at the end of Rpl_info_table::do_init_info() function we will see the call of access->drop_thd(thd) which code is the following:

void Rpl_info_table_access::drop_thd(THD *thd)
{
  DBUG_ENTER("Rpl_info::drop_thd");

if (thd_created)
  {
    System_table_access::drop_thd(thd);
    thd_created= false;
  }

DBUG_VOID_RETURN;
}

For our case thd_created is false, and that is why thd is not deleted and the memory managed with thd->mem_root is not deallocated.

That is why if slave sql thread is restarted, it's thd is deleted, and, as a result thd->mem_root memory is deallocated, what can be considered as workaround for this bug.

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2017-06-04:

#8

The simple mtr test to repeat it:

*.test:
=======================
--source include/have_binlog_format_row.inc
--source include/have_gtid.inc

--echo *** Set up master (server_1) <-> master (server_2) replication ***
--let $rpl_topology= 1->2->1
--source include/rpl_init.inc
--echo

--connection server_2
--let $s2b=`select * from sys.memory_global_total`
--connection server_1
--let $s1b=`select * from sys.memory_global_total`

--disable_query_log
CREATE TABLE t1(a INT);
--let $i=3000
while($i)
{
--eval INSERT INTO t1 VALUES ($i)
--dec $i
}
DROP TABLE t1;
--enable_query_log

--source include/rpl_sync.inc
--connection server_2
--let $s2a=`select * from sys.memory_global_total`
--connection server_1
--let $s1a=`select * from sys.memory_global_total`
=================================

*.cnf:
=================================
!include ../my.cnf

[mysqld.1]
sync_relay_log_info=1
enforce-gtid-consistency=ON
gtid-mode=ON
master-info-repository=TABLE
relay-log-info-repository=TABLE
log-slave-updates

[mysqld.2]
sync_relay_log_info=1
enforce-gtid-consistency=ON
gtid-mode=ON
master-info-repository=TABLE
relay-log-info-repository=TABLE
log-slave-updates

--echo "server_1: $s1b $s1a"
--echo "server_2: $s2b $s2a"

# Cleanup
--source include/rpl_end.inc
==================================

can be placed in "mysql-test/suite/rpl/t" .

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2017-06-05:

#9

See also this pull request:
https://github.com/percona/percona-server/pull/1782

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2017-06-05:

#10

See also this pull request: https://github.com/percona/percona-server/pull/1782 .

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#11

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-990

	Status	Importance	Assigned to	Milestone
MySQL Server	Unknown	Unknown	mysql-bugs #85251
Percona Server moved to https://jira.percona.com/projects/PS	Status tracked in 5.7
5.5	Invalid	Undecided	Unassigned
5.6	Invalid	Undecided	Unassigned
5.7	Fix Released	High	Vlad Lesin	Percona Server moved to https://jira.percona.com/projects/PS 5.7.18-16

Percona Server moved to https://jira.percona.com/projects/PS

percona 5.7.13-6-1 SLAVE SQL_THREAD memory leaks on the active server with Master-Master replication when relay_log_info_repository=TABLE

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches