Last week I had to confront one of those situations where you can’t really tell what is going on with a piece of software, and the final conclusion would sound completely crazy if postulated as the initial hypothesis. The regular MySQL commands and utilities fall short in these cases, so I had to resort to the three tools reviewed in this article.
The problem we were diagnosing was this: at some point in time, a number of queries that use to take less than one minute to execute, started to take between five to 15 minutes. We needed to get an insight into what was going on inside the MySQL server.
MySQL Tuner
At some point in a long diagnosis process, MySQL’s SHOW [GLOBAL] VARIABLES
and SHOW [GLOBAL] STATUS
are nothing more than a long list of numbers. Going through a team mate’s notes on another issue, I came across MySQL Tuner. This is an extremely simple tool that takes the information from the SHOW
statements mentioned above and combines it in a useful way. Sample run:
>> MySQLTuner 1.0.0 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at https://mysqltuner.com/ >> Run with '--help' for additional options and output filtering [!!] Successfully authenticated with no password - SECURITY RISK! -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.0.51a-3ubuntu5.4 [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in InnoDB tables: 16K (Tables: 1) [OK] Total fragmented tables: 0 -------- Performance Metrics ------------------------------------------------- [--] Up for: 9h 0m 49s (70 q [0.002 qps], 13 conn, TX: 24K, RX: 2K) [--] Reads / Writes: 100% / 0% [--] Total buffers: 58.0M global + 2.6M per thread (100 max threads) [OK] Maximum possible memory usage: 320.5M (18% of installed RAM) [OK] Slow queries: 0% (0/70) [OK] Highest usage of available connections: 1% (1/100) [OK] Key buffer size / total MyISAM indexes: 16.0M/48.0K [!!] Query cache efficiency: 0.0% (0 cached / 7 selects) [OK] Query cache prunes per day: 0 [OK] Temporary tables created on disk: 0% (0 on disk / 8 total) [OK] Thread cache hit rate: 92% (1 created / 13 connections) [OK] Table cache hit rate: 86% (40 open / 46 opened) [OK] Open file limit used: 7% (78/1K) [OK] Table locks acquired immediately: 100% (53 immediate / 53 locks) [OK] InnoDB data size / buffer pool: 16.0K/8.0M -------- Recommendations ----------------------------------------------------- General recommendations: MySQL started within last 24 hours - recommendations may be inaccurate Enable the slow query log to troubleshoot bad queries Variables to adjust: query_cache_limit (> 1M, or use smaller result sets)
The beauty and usefulness of this tool is its simplicity. In the case at hand, based on the report from this tool, we were able to immediately discard the temporary tables and the values for the different buffers and caches as the source of our issues.
Maatkit’s Query Profiler
This tool is part of the Maatkit toolset and it provides timing information for a SQL script file. This gives a lot of flexibility on the type of tests you can perform. For example, running the same set of queries with different buffer sizes. Sample run:
./mk-query-profiler.pl --database data test.sql +----------------------------------------------------------+ | 2 (0.7788 sec) | +----------------------------------------------------------+ __ Overall stats _______________________ Value _____________ Total elapsed time 1.178 Questions 19 COMMIT 0 DELETE 0 DELETE MULTI 0 INSERT 0 INSERT SELECT 0 REPLACE 0 REPLACE SELECT 0 SELECT 2 UPDATE 0 UPDATE MULTI 0 Data into server 1648 Data out of server 3808 Optimizer cost 164373.559 __ Table and index accesses ____________ Value _____________ Table locks acquired 40 Table scans 0 Join 0 Index range scans 0 Join without check 0 Join with check 0 Rows sorted 156976 Range sorts 2 Merge passes 6 Table scans 0 Potential filesorts 0
Looking at queries executing on the data before and after the incident, we could see that the number of table scans and potential filesorts were different. This was the first clue that the problem was in the queries’ execution plan. We prepared different scripts with different SELECT
alternatives and were able to quickly compare the changes in these numbers.
Maatkit’s Visual Explain
Another tool from the Maatkit’s set. This simply represents the EXPLAIN
output as a tree using regular printable characters. Sample output:
./mk-visual-explain.pl --database data -c test_expl.sql JOIN +- Bookmark lookup | +- Table | | table table1 | | possible_keys PRIMARY | +- Unique index lookup | key table1->PRIMARY | possible_keys PRIMARY | key_len 4 | ref data.table0.CID | rows 1 +- JOIN +- Unique index lookup | key table5->PRIMARY | possible_keys PRIMARY | key_len 4 | ref data.table0.STID | rows 1 +- JOIN +- Filter with WHERE | +- Bookmark lookup | +- Table | | table table4 | | possible_keys PRIMARY,Index_5,Index_6 | +- Unique index lookup | key table4->PRIMARY | possible_keys PRIMARY,Index_5,Index_6 | key_len 8 | ref data.table0.OID,data.table0.CID | rows 1 +- JOIN +- Filter with WHERE | +- Bookmark lookup | +- Table | | table table0 | | possible_keys PRIMARY,OID_idx | +- Index lookup | key table0->OID_idx | possible_keys PRIMARY,OID_idx | key_len 4 | ref data.forpeop2_.OPID | rows 1 +- JOIN +- Index lookup | key table2->PRIMARY | possible_keys PRIMARY,Index_2,Index_3 | key_len 4 | ref data.table3.FPID | rows 1 +- Filesort +- Filter with WHERE +- Index lookup key table3->Index_5 possible_keys PRIMARY,Index_5 key_len 1 ref const rows 36816
In our case, this output clearly provided two pieces of information. For the slow servers it showed a temporary table at the very beginning of the execution plan. This discrepancy between the servers explained the timing differences.
Looking at the good plan, we were able to rewrite the query, hinting the optimizer to use the proper indexes. Running this modified SELECT
through both the Query Profiler and the Visual Explainer on the slow servers, we were able to corroborate that the problem was in the optimizer and not the data itself (mysqlcheck didn’t reveal any corruption either).
Conclusion
None of these three tools show hidden information or use obscure algorithms (all three are Perl scripts), they just represent the existing data in a clever format. When I am looking at complex information or subtle differences in the data, these kind of tools are the ones that make my life easier.
Appendix: Note on the Actual Problem
As noted above, we found an issue in the MySQL optimizer. In about 8M worth of data (out of 170M), the internal MySQL statistics were changed enough to make the optimizer execute a bad plan. We ran ANALYZE TABLE and OPTIMIZE TABLE without any positive results. Unfortunately we can’t file a bug report because the data and database structure are proprietary. At the time of incident, the server in use was version 5.0.17.
3 Comments. Leave new
Any 5.0.xx lower than 32 should be considered highly dangerous, and not suitable for production use. Lots of nasty bugs everywhere, including with InnoDB, the optimiser, and replication. Hmm, that’s about all the key bits you really need to work well ;-)
Sometimes old(er) versions are fine, sometimes they are not. IMHO 5.0.17 is a case of not.
[…] applications at once it is vital to ensure that everything is working fine. Found a nice post about MySQL diagnostic tools. Should check them out on our production servers. Seems […]
Although the files were initially under 5.0.17, one of the first steps we took was to upgrade to 5.0.67. The ANALYZE and OPTIMIZE commands were run under this release. So maybe the damage done before couldn’t be fixed or the bug is still present in 5.0.67.