MySQL Crashes on DDL statement: A Lesson on purge threads

Posted in: MySQL, Technical Track

Recently there have been several issues reported to me about DDL activity causing MySQL crash scenarios. In one case it stemmed from dropping multiple databases, one after the other in rapid succession. But in the case that I was recently dealing with directly, where we were upgrading to MySQL 5.7, it was the result of mysql_upgrade running an ALTER TABLE FORCE on a 2.2Tb table in order to convert it to the new microsecond precision supporting data format.

The issue occurred after the intermediate table had been completely filled with all the necessary data and right when MySQL would swap out the existing table for the intermediate. After a period of time MySQL crashed and the following InnoDB monitor output was found in the error log.

 


2017-11-19T00:22:44.070363Z 7 [ERROR] InnoDB: The age of the last checkpoint is 379157140, which exceeds the log group capacity 377483674.
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-11-19T00:22:57.447115Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140690671367936 has waited at srv0srv.cc line 1982 for 921.00 seconds the semaphore:
X-lock on RW-latch at 0x750a368 created in file dict0dict.cc line 1184
a writer (thread id 140690645923584) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /build/mysql-5.7-RrA758/mysql-5.7-5.7.20/storage/innobase/row/row0mysql.cc line 4305
2017-11-19T00:22:57.447181Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140690535433984 has waited at buf0flu.cc line 1209 for 660.00 seconds the semaphore:
SX-lock on RW-latch at 0x7ff5d4a5eaa0 created in file buf0buf.cc line 1460
a writer (thread id 140690645923584) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0sel.cc line 1335
Last time write locked in file /build/mysql-5.7-RrA758/mysql-5.7-5.7.20/storage/innobase/row/row0upd.cc line 2856
2017-11-19T00:22:57.447210Z 0 [Warning] InnoDB: A long semaphore wait:
--Thread 140690654582528 has waited at row0purge.cc line 862 for 923.00 seconds the semaphore:
S-lock on RW-latch at 0x750a368 created in file dict0dict.cc line 1184
a writer (thread id 140690645923584) has reserved it in mode exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 862
Last time write locked in file /build/mysql-5.7-RrA758/mysql-5.7-5.7.20/storage/innobase/row/row0mysql.cc line 4305
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending preads 0, pwrites 0
.....
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-11-19T00:23:27.448900Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to 10364 be hung.
2017-11-18 19:23:27 0x7ff51a7d9700 InnoDB: Assertion failure in thread 140690688153344 in file ut0ut.cc line 916


 

The key thing to note here is that there was a purge event (row0purge.cc) on the data dictionary (dict0dict.cc) that formed a semaphore, which lasted longer than 600 seconds. In the event of a semaphore wait lasting longer than 600 seconds, MySQL will intentionally crash itself.

There were a few other things that we noticed in the output, specifically that there was a warning that the last checkpoint age was beyond the threshold of the redo log files, indicating the redo log accumulation was exceeding InnoDB’s ability to flush changes to persistent space and this was definitely causing a bottleneck.

In order to get past this issue, we increased the redo log space to 8G, as some testing indicated that this is how much data would be written to the redo log during the ALTER TABLE FORCE process on a table that was 2.2Tb in size. The amount of memory and the buffer pool size were also increased in order to speed up the ALTER TABLE FORCE process.

One other key change, and the change that we believe resolved the issue was setting the variable innodb_purge_threads from the 5.7 default value of 4 to 1. If you check the MySQL reference guide you’ll see that there is a note that states to keep this setting low so that the threads do not contend with each other for access to the busy tables. With a little googling you can also find bugs reports like this one where similar issues where setting this variable to 1 was suggested.

Conclusion

The takeaway from this is simply to suggest that if you are running MySQL 5.7, you will want to consider changing the variable innodb_purge_threads from it’s new default value of 4, to the 5.6 default value of 1 to avoid purge thread contention. That is, unless you have a strong need for multiple purge threads running. This also begs the question of why the default was changed from 4 to 1 in 5.7 as it could be considered as an unsafe value.

email

Interested in working with Peter? Schedule a tech call.

1 Comment. Leave new

good find Peter, thx for this!

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *