The Maatkit toolkit is a real blessing for the MySQL DBA. And while its documentation is pretty good, in some cases it’s necessary to read carefully a second and third time to make sure you are not missing an important piece of information. In this article I will comment on mk-table-chksum and mk-table-sync. My comments are mostly aimed at those DBAs who are considering using these utilities with medium or larger-sized databases.
–replicate
This option allows you to store the checksum results on the master, in a table that will get replicated to the slaves. Although it might seem like overhead for a simple check, it really simplifies your life, especially when used in combination with mk-table-sync
. I always use it, for the following reasons:
- You only need to run
mk-table-chksum
on the master. - A simple query will tell you the slave status.
- When used with
--chunksize
, it divides the checking and synching into manageable portions. - It’s the best way to get consistent checks between master and slaves.
I always define the table as an InnoDB table to improve concurrency and avoid deadlocks (see the tool documentation for details). This table can be created on any database. Just make sure that it is going to be replicated. Here’s an example of the table definition (straight from the documentation) with the InnoDB specification at the end:
CREATE TABLE checksum ( Â db char(64) NOT NULL, Â tbl char(64) NOT NULL, Â chunk int NOT NULL, Â boundaries char(64) NOT NULL, Â this_crc char(40) NOT NULL, Â this_cnt int NOT NULL, Â master_crc char(40) NULL, Â master_cnt int NULL, Â ts timestamp NOT NULL, Â PRIMARY KEY (db, tbl, chunk) Â ) ENGINE = InnoDB;
–chunksize
One of the concerns when checking big tables is the possibility of locking them for a significant time, and thus seriously disrupting the application using the database. Although it is always preferable to run big checks during maintenance windows, that’s not always possible.
My rule of thumb is to try to divide each table in approximately ten chunks. I have also found that chunks larger than 500M rows tend to be long-running. So if you have a concern about performance or concurrency, or both, you will have to find for yourself the right way to size the proper chunks for each table. A colleague of mine came up with the following query to execute on the Information Schema:
SELECT table_schema, table_name, table_rows, Â ROUND(table_rows/11,-5) chunk_size, ROUND(table_rows/(500000 * 10),1) over_threshold_factor FROM tables WHERE table_rows > 500000 * 10 ORDER by table_rows;
You can adjust it to suit your needs or local criteria. The advantage of this setting is more apparent when it is used with replicate
. In that case, each chunk will be identified by the values of the key used to divide them, making it easier to find and correct errors, whether automatically or manually.
–databases / –tables / –engine
Sometimes a given table and or database might be too large, causing these tests to run for a very long time. In these cases, it’s best to run the utility on a single database and/or table with options tailored to it. For example, use a bigger chunksize — double or triple that used for the rest of the tables. Other options that can be used instead of these are: ignoreengine
, ignoredb
, and ignoretable
. Which one is the most appropriate will depend entirely on each use case.
–print (mk-table-sync only)
Running mk-table-sync
with this option will print to stdout
the INSERT
/UPDATE
/DELETE
statements that it would apply to the server in question to synchronize it (remember that the utility can be used to restore a master from a slave backup or to bring a slave up to date). This way you can make sure that the operations to be applied are the ones you expected. You can also use the data to sample rows in the table and or databases in question.
Due to issues in dealing with double-wide characters, I wouldn’t use it to apply the changes to the database directly. mysqldump
has similar issues.
Putting It All Together
The following is an example taken from real life, run on a table with 2.5 million rows. The database, table, and user names have been modified for privacy.
time perl mk-table-checksum --replicate=test.checksum --algorithm=BIT_XOR \ > localhost --user=mkuser --askpass \ > --databases product --tables damaged_table \ > --chunksize 300000 --emptyrepltbl Enter password for localhost: DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG product damaged_table 0 localhost InnoDB 600754 b738a6daf5a741b613248ca2477eedc0 18 NULL NULL NULL product damaged_table 1 localhost InnoDB 579800 b00b90ddf83d86718513eb9dad72861d 22 NULL NULL NULL product damaged_table 2 localhost InnoDB 540084 4928a70a6c2a7211f59d659e62466342 15 NULL NULL NULL product damaged_table 3 localhost InnoDB 512660 be14adf7b411d2b6d432e6af14975b0c 13 NULL NULL NULL product damaged_table 4 localhost InnoDB 410632 8757e1a8f39e4a5dddf29095725815f7 14 NULL NULL NULL product damaged_table 5 localhost InnoDB 167034 b5ebf8d23ab676c341d4eafd7bab35b9 5 NULL NULL NULL product damaged_table 6 localhost InnoDB 6605 af98b69f745c2eb0e81a485bbbea379f 0 NULL NULL NULL real   1m41.142s user 0m0.146s sys 0m0.018s
The corresponding mk-table-sync
command would be:
time ./mk-table-sync --print --chunksize 300000 --databases prod1 \ --tables deleted_circles --replicate=avail.checksum \ --askpass --verbose --synctomaster u=prod_root,h=localhost
(There’s the nuance of having a different way to specify host and user name for mk-table-sync
, but I can live with that.)
Conclusion
These are powerful utilities to keep your MySQL databases in good health. When dealing with medium size and larger installations, however, the default options will not be enough. You will need to factor in table sizes and concurrency issues, as it won’t always be possible to bring the database off-line to perform maintenance, preventive or otherwise.
1 Comment. Leave new
[…] Tools EC2 Consistent Snapshot Using Maatkit to restore Slave-Master Maatkit.Org MySQL Sandbox Auto MySQL […]