So, how does one gather statistics on indexes? With InnoDB it’s one thing, with MyISAM it’s another thing. This post however, will focus on MyISAM. There are various ways, each having their own drawbacks and positive sides.
ANALYZE TABLE? It’s great — very fast in InnoDB. But with MyISAM a lock occurs while scanning the table.
OPTIMIZE TABLE? Still relatively quick with MyISAM, unless the table in question is huge. InnoDB requires an
ALTER TABLE — it might take forever, depending on the server configuration, as
OPTIMIZE TABLE for InnoDB maps to a
ALTER TABLE tableName ENGINE=InnoDB
We all know how
REPAIR TABLE, and
OPTIMIZE TABLE work with MyISAM. However, there’s a less explored way, for MyISAM — the
myisam_stats_method is used to rebuild the statistics of the indexes, which again is used by the optimizer to make smart choices, such as what index to use for a given query.
This allows you to change the way in which MyISAM table index statistics are gathered. The default is
nulls_unequal, with which every
NULL is not considered the same. Instead of being grouped together with other NULL values, each NULL forms its own group of size
1. This might make the optimizer look for ref when other joins are better than that, simply because of the amount of
NULL groups with size
1. A group, in this context, is rows with the same value.
The other option,
nulls_equal, is when all
NULL values are treated as equals, which means they form a single value group which will (potentially) be much larger than the average value group, and might offset the optimizer, deeming the group less useful than it really is, and thus not use the ref method when it really should.
For index rebuilding (really regenerating the statistics), there is a quick way: change the variable
nulls_equal, and issue a
check table <table>. This will rebuild the statistics of the indexes. Then change the stats method again, and go through the same process to get the normal process. Not the most efficient method, but in some cases, it might save you a lot of valuable time, and also downtime, by being a lot faster than other options.