5.1.60-13.1 - /usr/sbin/mysqld: malloc(): memory corruption

Bug #908531 reported by piavlo
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Triaged
High
Unassigned
5.5
Triaged
High
Unassigned
5.6
New
Undecided
Unassigned
5.7
New
Undecided
Unassigned

Bug Description

 Hi,
Since I've upgraded from 5.1.60-13.1 - on two servers - had the following crashes reoccurring on both servers.

--------------------
111221 3:38:33 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=209715200
read_buffer_size=2097152
max_used_connections=6
max_threads=1400
threads_connected=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5954665 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x1364a2f0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4a0d20f8 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x39)[0x8a21b9]
/usr/sbin/mysqld(handle_segfault+0x350)[0x5bb7e0]
/lib64/libpthread.so.0[0x2aaaaacd6b10]
/lib64/libc.so.6(gsignal+0x35)[0x2aaaab9f5265]
/lib64/libc.so.6(abort+0x110)[0x2aaaab9f6d10]
/lib64/libc.so.6[0x2aaaaba2f99b]
/lib64/libc.so.6[0x2aaaaba380fe]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x2aaaaba39e2e]
/usr/sbin/mysqld(my_malloc+0x32)[0x88fa32]
/usr/sbin/mysqld(init_alloc_root+0x6e)[0x8906ae]
/usr/sbin/mysqld(_Z14init_sql_allocP11st_mem_rootjj+0x15)[0x56e0a5]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjj+0x3d)[0x61018d]
/usr/sbin/mysqld(_Z28open_and_lock_tables_derivedP3THDP10TABLE_LISTb+0x67)[0x6108e7]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyyb+0x64)[0x653694]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x30bb)[0x5cd37b]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x51b)[0x5d08cb]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x3c8)[0x67e728]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x112)[0x6e2922]
/usr/sbin/mysqld(handle_slave_sql+0x8f7)[0x6e7d67]
/lib64/libpthread.so.0[0x2aaaaacce73d]
/lib64/libc.so.6(clone+0x6d)[0x2aaaaba994bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2aadd84d6c05): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
--------------------------

Another one

--------------------------
111221 3:44:42 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=209715200
read_buffer_size=2097152
max_used_connections=0
max_threads=1400
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5954665 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0xe6df70
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4aad30f8 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x39)[0x8a21b9]
/usr/sbin/mysqld(handle_segfault+0x350)[0x5bb7e0]
/lib64/libpthread.so.0[0x2aaaaacd6b10]
/lib64/libc.so.6(gsignal+0x35)[0x2aaaab9f5265]
/lib64/libc.so.6(abort+0x110)[0x2aaaab9f6d10]
/lib64/libc.so.6[0x2aaaaba2f99b]
/lib64/libc.so.6[0x2aaaaba380fe]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x2aaaaba39e2e]
/usr/sbin/mysqld(my_malloc+0x32)[0x88fa32]
/usr/sbin/mysqld(init_alloc_root+0x6e)[0x8906ae]
/usr/sbin/mysqld(_Z14init_sql_allocP11st_mem_rootjj+0x15)[0x56e0a5]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjj+0x3d)[0x61018d]
/usr/sbin/mysqld(_Z28open_and_lock_tables_derivedP3THDP10TABLE_LISTb+0x67)[0x6108e7]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyyb+0x64)[0x653694]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x30bb)[0x5cd37b]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x51b)[0x5d08cb]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x3c8)[0x67e728]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x112)[0x6e2922]
/usr/sbin/mysqld(handle_slave_sql+0x8f7)[0x6e7d67]
/lib64/libpthread.so.0[0x2aaaaacce73d]
/lib64/libc.so.6(clone+0x6d)[0x2aaaaba994bd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x1352e405): /*
  processPosCoeffs.sql is used to enter estimated position coefficients into a production table which
  is used in prediction accumulation, and into a log table for analysis purposes
*/

DELETE FROM opt111221 3:44:42 [Note] Slave I/O thread: connected to master '<email address hidden>:3306',replication started in log 'mysql-bin.000492' at position 438362435
imization.position_coeffs WHERE optimization_id = 24
Connection ID (thread ID): 2
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
--------------------------

This is what can I see in terminal when mysql crashes

--------------------
*** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x000000001352e570 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2aaaaba380fe]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x2aaaaba39e2e]
/usr/sbin/mysqld(my_malloc+0x32)[0x88fa32]
/usr/sbin/mysqld(init_alloc_root+0x6e)[0x8906ae]
/usr/sbin/mysqld(_Z14init_sql_allocP11st_mem_rootjj+0x15)[0x56e0a5]
/usr/sbin/mysqld(_Z11open_tablesP3THDPP10TABLE_LISTPjj+0x3d)[0x61018d]
/usr/sbin/mysqld(_Z28open_and_lock_tables_derivedP3THDP10TABLE_LISTb+0x67)[0x6108e7]
/usr/sbin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP10SQL_I_ListI8st_orderEyyb+0x64)[0x653694]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x30bb)[0x5cd37b]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x51b)[0x5d08cb]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x3c8)[0x67e728]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x112)[0x6e2922]
/usr/sbin/mysqld(handle_slave_sql+0x8f7)[0x6e7d67]
/lib64/libpthread.so.0[0x2aaaaacce73d]
/lib64/libc.so.6(clone+0x6d)[0x2aaaaba994bd]
======= Memory map: ========

-------------------------------
*** glibc detected *** /usr/sbin/mysqld: malloc(): memory corruption: 0x00002aac145222c0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x2aaaaba380fe]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x2aaaaba39e2e]
/usr/sbin/mysqld(my_malloc+0x32)[0x88fa32]
/usr/sbin/mysqld(alloc_root+0x7e)[0x89022e]
/usr/sbin/mysqld(_Z10MYSQLparsePv+0x5db3)[0x5ebfb3]
/usr/sbin/mysqld(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctx+0x9d)[0x5c492d]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x46f)[0x5d081f]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x3c8)[0x67e728]
/usr/sbin/mysqld(_Z26apply_event_and_update_posP9Log_eventP3THDP14Relay_log_info+0x112)[0x6e2922]
/usr/sbin/mysqld(handle_slave_sql+0x8f7)[0x6e7d67]
/lib64/libpthread.so.0[0x2aaaaacce73d]
/lib64/libc.so.6(clone+0x6d)[0x2aaaaba994bd]
======= Memory map: ========
----------------------------

After I've downgraded to 5.1.59-13.0 - the problem is gone.

Tags: 5.1.60-13.1
piavlo (piavka)
tags: added: 5.1.60-13.1
removed: 5.1.59-13.0
Revision history for this message
piavlo (piavka) wrote :

Do you need more info in order to look into this?

Revision history for this message
Patrick Crews (patrick-crews) wrote :

Yes:
1) Could you describe your schemas / tables (quantity, composition, row counts, partitions, table engines, foreign keys). Ballpark stuff to help us get closer to duplicating things

2) Does the crash happen immediately upon starting the server, under load (if so, reads / writes / etc?)

3) Any particular server options you are using would help as well.

Thanks!

Oleg Tsarev (tsarev)
Changed in percona-server:
status: New → Incomplete
assignee: nobody → Patrick Crews (patrick-crews)
Revision history for this message
piavlo (piavka) wrote :

1) Do you really want 7K of lines of schema/tables dump? There are about 300 tables.
All tables are innodb - except server 5 or 6 are myisam. There are several fogeign keys and partitions too.
2)The crash happens randomly - every X minutes or hours and server auto restarted by mysqld_safe
    It happens on slave - doing only replication in MIXED mode.
3)The options are

max_connections = 1400
max_user_connections = 1400
max_connect_errors = 4294967295

server-id = 4

innodb_use_purge_thread = 1
innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
innodb_log_file_size = 512M
innodb_log_buffer_size = 4M
innodb_log_files_in_group = 2
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_file_per_table = 1
innodb_support_xa = 0
innodb_adaptive_flushing = 0

innodb_auto_lru_dump = 300

innodb_read_io_threads = 4
innodb_write_io_threads = 4
innodb_io_capacity = 200
innodb_read_ahead = linear
innodb_adaptive_checkpoint = estimate

innodb_log_block_size = 512
innodb_flush_neighbor_pages = 1

innodb_buffer_pool_size = 5G
innodb_additional_mem_pool_size = 32M

innodb_thread_concurrency = 4

auto_increment_increment = 2
auto_increment_offset = 1

log_slave_updates = 1
concurrent_insert = 1

replicate-same-server-id = 0
replicate-wild-ignore-table = mysql.%
replicate-wild-ignore-table = information_schema.%

log-bin = mysql-bin
log-bin-index = mysql-bin.index
sync_binlog = 0
expire_logs_days = 7
max_binlog_size = 1024M

relay_log = relay-bin
relay_log_index = relay-bin.index

log_bin_trust_function_creators = 1
binlog-format = MIXED

innodb_locks_unsafe_for_binlog = 1

innodb_lock_wait_timeout = 50
slave_transaction_retries = 10

innodb_overwrite_relay_log_info = 0

long_query_time = 15
log_slow_rate_limit = 1

slow_query_log = 1
slow_query_log_file = /var/lib//mysql/slow_queries.log
slow_query_log_microseconds_timestamp = 1

log_slow_slave_statements = 0

log_slow_verbosity = microtime,query_plan,innodb

log_slow_filter = full_scan,full_join

tmpdir = /ephemeral/mysql/tmp

userstat_running = 1

Revision history for this message
piavlo (piavka) wrote :

You can notice that this config is for master-master replication -0 but this also happens with normal replication - if it might matter.

Changed in percona-server:
status: Incomplete → New
Revision history for this message
Patrick Crews (patrick-crews) wrote :

This should help us get closer to repeating things - thank you very much for sharing.
We'll be trying to create a test case / duplicate things for our devs later this week.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

The same problem in 5.5 was reported as bug #915814.

Revision history for this message
PavelVD (pdobryakov) wrote :

Faced with this problem for more than 20 servers. It does not help disable the query cache and logs all possible. Forced to roll back the hands of the server to version 57. Extremely critical issue that, by itself, can lead to data loss.

Revision history for this message
PavelVD (pdobryakov) wrote :

Maybe need some additional information to determine the cause of this problem?

Revision history for this message
Oleg Tsarev (tsarev) wrote :

We have fix of bug merged to 5.5.
Fix proposed for 5.1, but not yet reviewed.
Thank you for your feedback, we will review fix in nearest time and bug will be fixed in next Percona-Server release.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

The identical issue has been reported against 5.5.18, which has the fix for this bug included. So it cannot be a duplicate of bug #705688. Or bug #705688 has not been really fixed in 5.5.18. This should be clarified before changing the status of this bug. Reverting the original status.

Revision history for this message
Oleg Tsarev (tsarev) wrote :

Can you please provide following information:

1) Exact query which acquired the crash (if you have specific)
2) Binary log which acquired the crash with some events before
3) Schema of your tables (if this possible, of course)

So, if this problem related to query cache enhance then disable query cache will help you.

Revision history for this message
Oleg Tsarev (tsarev) wrote :

Let me provide feedback.

Between 5.1.59 and 5.1.60 changed memory allocation in the mysql:

diff -ruN mysql-5.1.59/sql/sql_cache.cc mysql-5.1.60/sql/sql_cache.cc
--- mysql-5.1.59/sql/sql_cache.cc 2011-08-11 22:52:53.000000000 +0900
+++ mysql-5.1.60/sql/sql_cache.cc 2011-10-30 03:09:47.000000000 +0900
@@ -1251,7 +1251,7 @@
       DBUG_PRINT("qcache", ("No active database"));
     }
     tot_length= thd->query_length() + thd->db_length + 1 +
- QUERY_CACHE_FLAGS_SIZE;
+ sizeof(size_t) + QUERY_CACHE_FLAGS_SIZE;
     /*
       We should only copy structure (don't use it location directly)
       because of alignment issue

My fix https://code.launchpad.net/~tsarev/percona-server/5.1_fix_bug_856404/+merge/87203 of bug #924872 include changes related to this feature.
Unfortunatelly, this fix not merged to 5.1.60 :( This is a reason of your crash.

Revision history for this message
Oleg Tsarev (tsarev) wrote :

Please ignore my comment. Any query passed to binary log can not be used with query cache by design (Statement Based Replication doesn't send SELECT queries, query cache works with just SELECT queries).

Revision history for this message
Oleg Tsarev (tsarev) wrote :

Statement Based Replication doesn't send to slave "just SELECT" queries.
Query Cache works with "just SELECT" queries (other are ignored).

Related to query-cache-strip-comments code doesn't allocate memory if
query "not a SELECT".

From this point of view I'm sure - this problems not related to
query-cache-strip-comments.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.