This is the final installment of a five part blog series to explore InnoDB internals by looking at the related tunable system variables. In this section we’re going to cover variables that relate to enforcing data consistency, and how index statistics are handled and stored.
Just like previous sections, I would like to emphasize something that was written in part one of this blog post series.
“I should note that while tuning recommendations are provided, this objective of this blog post series was NOT meant to be a tuning primer, but instead to explore the mechanics that each variable interacts with. As such I would like to advise against reading this guide and trying to fine tune all of the available InnoDB variables. System variable tuning is an exercise in diminishing returns, the most benefit you’ll get out of tuning your MySQL server will occur within the first 15 minutes of configuration. In order to help prevent excess tuning I will provide a list of variables out of each section that should be set when configuring a new system.”
Let’s take a look at the remaining tunable InnoDB system variables.
Definition: If enabled, foreign keys are observed. If disabled, foreign keys are ignored.
Default: 1 (ON)
Tuning notes: You can disable this feature when you don’t need to have FKs observed. This Most commonly occurs when you need to do a large data import or restore a logical backup where you know that the data that’s being loaded is consistent and doesn’t require validation on record insertion. This can be disabled at the session level. If you are consider removing this feature permanently in order to reduce the overhead cost of foreign keys, you may want to consider removing the foreign keys instead.
Definition: InnoDB determines its query execution plan based on statistics that are gathered for indexes that are part of the table(s) being queried. This variables allows you to store those statistics in a persistent fashion (found in the mysql.innodb_table_stats and mysql.innodb_index_stats tables) so that when MySQL is restarted you don’t have to recalculate the statistics the first time the table is accessed.
Tuning notes: The overhead of writing this information to disk is minimal and can save you unnecessary index checks after a MySQL restart. I would recommend leaving it on. If you feel like there are certain tables where this extra overhead is not necessary, you can disable it per table using the stats_persistant clause of the create table or alter table statements.
Associated Metrics: If you have file per table enabled, you can use the performance schema table file_summary_by_instance and look up read and write information for the files associated with mysql.innodb_table_stats and mysql.innodb_index_stats.
innodb_stats_persistent_sample_pages / innodb_stats_transient_sample_pages
Definition: The main driving property behind and index’s statistic is its cardinality. Cardinality is the number of unique values that you find in a given index. The more unique values, the higher the cardinality, and ultimately the more likely an index is going to be selected for use by the MySQL optimization engine. Though it should be noted that high cardinality is not always a guarantee that the optimization engine will use the index, always check your EXPLAIN output to see what the optimizer is doing.
These variables are used to specify how many table pages are analyzed when estimating the cardinality for a specific index. This is typically referenced only when running a command that would force MySQL to estimate the cardinality of the index, like ANALYZE TABLE.
If innodb_stats_persistent (see above) is enabled, then you would would use innodb_stats_persistent_sample_pages to specify how many pages should be sampled. If innodb_stats_persistent is disabled, then you would use innodb_stats_transient_sample_pages.
innodb_stats_persistent_sample_pages = 20 (pages)
Innodb_stats_transient_sample_pages = 8 (pages)
Tuning notes: Typically running ANALYZE TABLE is a very fast operation, and this is because it’s only sampling a very small portion of the data available. For example, if you have innodb_stats_persistent enabled, have innodb_stats_persistent_sample_pages at it’s default value of 20, and are using the standard 16k page size, then you’re only sampling 320k of data. This can be okay if you’re estimating index cardinality on a small table, but on larger tables this might be ineffective. Using the associated metrics below you can see how accurate the estimation is and if you find that that the estimated index cardinality is inaccurate to the point where it’s outside of acceptable tolerances, then run analyze table to get a new sample. If the estimation is still too inaccurate, then you can increase this value and run ANALYZE TABLE again. However, increasing this value will increase the amount of time it takes to estimate index cardinality.
Associated Metrics: The best way to determine if these variables need to be adjusted is by checking the estimated cardinality by looking at the SHOW INDEX output for the index in question. You can detect the actual cardinality of the index by using the following query where the listed column is the column covered by the index.
SELECT COUNT(DISTINCT col1) FROM TABLE;
Definition: This variables allows you to enable or disable a feature that recalculates index statistics and updates the mysql.innodb_table_stats and mysql.innodb_index_stats tables accordingly when 10% of the table’s data has been changed.
Tuning notes: This likely isn’t going to cause a lot of overhead for slow or consistently growing tables. Be sure to note that this only applies to tables that are affected by innodb_stats_persistent, so if you have a table that has a lot of massive changes, like a table that’s being loaded for temporary processing before being emptied or changed entirely, change the stats_persistant property of the table itself.
Associated Metrics: See the metrics section for the innodb_stats_persistent variable, this would fall under the same metric.
Definition: The variable controls whether InnoDB will recalculate index statistics when you use metadata statements like ‘show table status’, ‘show index’ or when you try to get information about a table using the tables information_schema.tables or information_schema.statistics. Unlike innodb_stats_auto_recalc, this variable is in no way impacted by whether innodb_stats_persistent is enabled or disabled as this has nothing to do with the persistent storage of index statistics, it’s just for the cache that’s storing that data at the moment.
Default: OFF (This was ‘ON’ in earlier versions of 5.6)
Tuning notes: I would recommend leaving this off in case you need to troubleshoot a production issue by looking at the associated information_schema tables. It can also be detrimental for schemas that have a large number of table.
Associated Metrics: If you have innodb_stats_on_metadata enabled and you want to know how often stats are being collected, I would recommend using the performance schema table events_statements_summary_by_digest and search for queries with a digest text that references the commands listed above that would trigger the statistics collection.
Out of the variables listed above, I would recommend you set the following when configuring a new MySQL instance.
I hope you enjoyed this series and have a better understanding of what InnoDB is doing behind the scenes. Please free free to comment on any part of the series if you would like to start a dialogue on any of the topics discuss.
Again, I would like to thank Ivan Groenwold for his contributions and assistance in the creation of this blog series.