A few weeks ago one of our customers reported a slave sql thread stopped with the following error: “Column 3 of table X cannot be converted from type decimal(0,?) to type decimal(4,3)” on one out of 3 slaves. The other two were already beyond the offending event and no errors were raised. This was not just another replication issue.
In this opportunity I wanted to share another “complex” replication error I encountered and how it was resolved. As it says above, we have one out of 3 slaves reporting that a decimal value could not be converted to … decimal?. Allow me to give you some more context to understand the problem.
The MySQL server version was 5.5 on all nodes. MySQL was showing that tables were using DECIMAL(4,3) on both master and slave, using any of the common methods (
DESC, SHOW CREATE TABLE, information_schema.COLUMNS, etc) but it was clear that something was different between them. After reviewing a related internal ticket, I found out that the tables on that slave were recently imported from a logical backup to enable file-per-table and that the error arose right after resuming replication. So, IF the data set was born before 5.0, AND the server was in-place upgraded AND the slave was seeded from a master’s physical backup AND the affected table was never recreated ‘from scratch’ on the master THEN the table on the master could still be using the old DECIMAL data type (MYSQL_TYPE_DECIMAL).
As a side note, and in case you are wondering, NO, altering/repairing a table is not enough to start using any new data types. MySQL will keep the same data types to avoid errors when copying the rows from the old to the new table as part of the ALTER process. Also, creating a new table using
CREATE TABLE ... LIKE will make the new table to use the exact same data types as the source table.
Now, considering that all of the above was true (and it was), we have a decent explanation of why this error started to appear after the table was recreated on the slave. However, there is another question and that is WHY the slave sql thread is trying to convert a perfectly valid value from the “old” decimal format to the “new” one (MYSQL_TYPE_NEWDECIMAL). If you think this has to do with RBR, you are on the right track.
The master server was using MIXED binary log format and the offending event was logged in ROW format. When an event is logged in this format, MySQL will validate that the data types of the columns on the slaves matches the master. And yes, MySQL considers MYSQL_TYPE_NEWDECIMAL to be different from MYSQL_TYPE_NEWDECIMAL.
Fixing the issue
There are two things that needs to be corrected in this scenario: first we need to make DECIMAL format on the master and the slave to match by recreating the table on the former and second, we need to resume replication.
If the affected table is not heavily written, and the offending event is simple, you could resume replication by executing the event’s statements manually on the slave to then skip the event. If the events are huge and the table is being modified often (which was my case), perhaps a good solution is to ignore that table and then restore it from the master once it was recreated on the master. Setting
slave_type_conversions won’t do the trick as converting from old decimal to new decimal is not a supported attribute promotion. Also, to completely fix the problem, it’s a good idea to find which other tables could fall into the same problem by checking for other tables on the master using DECIMAL columns that are still using the old format. That information is stored on the table’s frm file and could be exposed using dbsake’s frmdump.
root@mypc:~# ./dbsake frmdump -t /var/lib/mysql/frm/tab1.frm -- -- Table structure for table `tab1` -- Created with MySQL Version 5.5.47 -- CREATE TABLE `tab1` ( `col1` decimal(5,2) DEFAULT NULL /* MYSQL_TYPE_NEWDECIMAL */, `tinyin` tinyint(4) DEFAULT NULL /* MYSQL_TYPE_TINY */, `bigin` bigint(20) DEFAULT NULL /* MYSQL_TYPE_LONGLONG */ ) ENGINE=InnoDB DEFAULT CHARSET=latin1; root@mypc:~# ./dbsake frmdump -t /var/lib/mysql/frm/tab1old.frm -- -- Table structure for table `tab1old` -- Created with MySQL Version 5.5.47 -- CREATE TABLE `tab1old` ( `col1` decimal(5,2) DEFAULT NULL /* MYSQL_TYPE_DECIMAL */, `tinyin` tinyint(4) DEFAULT NULL /* MYSQL_TYPE_TINY */, `bigin` bigint(20) DEFAULT NULL /* MYSQL_TYPE_LONGLONG */ ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Recreating the table on the master
To recreate the table, we can use pt-online-schema change which will not only create the table from scratch but will also perform this task automatically and with minimum locking. A dummy alter like ENGINE=<current engine> will be enough unless..
One last problem…
You might run into another issue related to the old DECIMAL format while recreating the table using pt-online-schema-change: the old format allocates one more character than the precision specified to store the value’s sign. This fact allows you to store positive numbers one digit longer from what’s specified for the column precision. For example, if the precision is (5,2), you will be able to store numbers with a four digit integer part (ie. 4321,12). This is no longer true for
MYSQL_TYPE_NEWDECIMAL and you might probably receive “Out of range” errors from pt-online-schema-change while copying the data to the new table. Depending of the version of the tool, you might need to execute it with PTDEBUG=1 to see the error.
....... # pt_online_schema_change:10337 2996 SHOW WARNINGS # Retry:3671 2996 Try code failed: Copying rows caused a MySQL error 1264: # Level: Warning # Code: 1264 # Message: Out of range value for column 'column1' at row 5 # Query: INSERT LOW_PRIORITY IGNORE INTO ... .......
To workaround this, you could extend the column precision by one digit to allow any larger values to be stored on the new table. For example, if your current column precision is (4,3), you could change it to (5,3) using pt-online-schema-change, fixing both problems with a single execution.
Perhaps this issue is one of many good reasons why MySQL always recommends logical over in-place upgrades. Any new data types will be used only when the table is created from scratch.
Regarding fixing the problem, of course you could always reseed the slave from a physical backup of the master and forget all this ever happened, but if you don’t want to loose any work made on the slave or if the slave cannot be easily reseeded, then the above may be the best method to achieve resolution.