MySQL: RENAME TABLE on Transactional Tables Can Jeopardize Slave Data

Posted in: MySQL, Technical Track

Do you have a master-slave MySQL set up?  Ever do DDL changes on the master?  You may be hit with a serious data integrity bug.  Read on.

One of our clients does a regular rename tables on the master to keep the current table small and archive off old data.  We’d occasionally be hit by a ‘duplicate key’ error on the slave on the current table and have to resolve it manually.  Digging into the issue, I managed to replicate it on demand and filed bug 39675 with MySQL, which subsequently has been verified and slated for fix, though from what it seems only in version 6.0.  The bug affects all versions of MySQL from 4.1 to 6.0.

In a nutshell, here is what happens. The rename tables command only checks for pending transactions or locks in the current session.  If there is a pending transaction in another session on the table being renamed, the rename will succeed, but the order in which the transaction is written to the binlog will be different from the order in which the transactions were applied on the master.  This means that the data on the slave will now be out of sync for this table.

Here’s a test-case:

CREATE TABLE `raj` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB;

CREATE TABLE `raj_next` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB;

Run the following inserts: insert into raj (b) values (1),(2),(3);

Run the following transaction — but do not commit!

begin;
insert into raj(b) values (4);

Now, in a different session, run this:

rename table raj to raj_backup, raj_next to raj;

Back in the original session, commit it: commit;, and then see that the transaction made it to raj_backup:

select * from raj;
Empty set (0.00 sec)

select * from raj_backup;
mysql> select * From raj_backup;;
+----+------+
| id | b    |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
|  4 |    4 |
+----+------+
4 rows in set (0.00 sec)

and on the slave, you will see this:

select * from raj;
+----+------+
| id | b    |
+----+------+
|  4 |    4 |
+----+------+
1 row in set (0.00 sec)

select * from raj_backup;
mysql> select * From raj_backup;;
+----+------+
| id | b    |
+----+------+
|  1 |    1 |
|  2 |    2 |
|  3 |    3 |
+----+------+
3 rows in set (0.00 sec)

Check the binlog and you will see why this is so:

#080926 11:45:36 server id 3  end_log_pos 2356  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443936/*!*/;
CREATE TABLE `raj` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB/*!*/;
# at 2356
#080926 11:45:36 server id 3  end_log_pos 2550  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443936/*!*/;
CREATE TABLE `raj_next` (
`id` int(11) NOT NULL auto_increment,
`b` int(11) default NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB/*!*/;
# at 2550
#080926 11:45:40 server id 3  end_log_pos 2578  Intvar
SET INSERT_ID=1/*!*/;
# at 2578
#080926 11:45:40 server id 3  end_log_pos 2679  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443940/*!*/;
insert into raj (b) values (1),(2),(3)/*!*/;
# at 2679
#080926 11:45:40 server id 3  end_log_pos 2706  Xid = 74
COMMIT/*!*/;
# at 2706
#080926 11:45:51 server id 3  end_log_pos 2816  Query   thread_id=4
exec_time=0     error_code=0
SET TIMESTAMP=1222443951/*!*/;
rename table raj to raj_backup, raj_next to raj/*!*/;
# at 2816
#080926 11:45:59 server id 3  end_log_pos 2884  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443959/*!*/;
BEGIN/*!*/;
# at 2884
#080926 11:45:46 server id 3  end_log_pos 2912  Intvar
SET INSERT_ID=4/*!*/;
# at 2912
#080926 11:45:46 server id 3  end_log_pos 3004  Query   thread_id=3
exec_time=0     error_code=0
SET TIMESTAMP=1222443946/*!*/;
insert into raj(b) values (4)/*!*/;
# at 3004
#080926 11:45:59 server id 3  end_log_pos 3031  Xid = 76
COMMIT/*!*/;
DELIMITER ;

The row with the insert_id of 4 was committed after the rename.  While this made the row go into the old table on the master, it got written to the binlog after the rename and hence goes into the new table on the slave.

MySQL has acknowledged and verified the bug, but it isn’t slated for a fix any time soon, since the worklist is classified as ‘medium’ priority even though the bug itself is classified as severity S1 (critical), Triage: D2 (Serious) needs triage. https://bugs.mysql.com/bug.php?id=39675

and the worklist entry (https://forge.mysql.com/worklog/task.php?id=4284).

Split from WL#3726.
Implement transaction-long metadata locks. I.e. if a table is used in a transaction, a metadata lock should be kept for this table until the transaction is committed. This will prevent concurrent transactions from issuing a DDL against this table. Necessary to fix BUG#989 If DROP TABLE while there’s an active transaction, wrong binlog order

I don’t know if there is a workaround, but for now, it may make sense to review your code to make sure you don’t have an implementation that may be affected.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Raj has been managing and guiding highly effective teams for over 20 years. Raj combines strong leadership skills with a deeply technical background in IT consulting to bring tailored and efficient technology solutions to Pythian. Additionally, his focus on data security has helped Pythian meet and often exceed client and regulatory requirements. Prior to joining Pythian in 2003, Raj held various leadership positions with a proven record of success. He has a degree in Industrial Engineering and has worked in multiple IT and engineering fields.

2 Comments. Leave new

Gregory Haase
October 8, 2008 8:20 pm

I don’t know if this is the kind of work-around you’re looking for, but I used to have a similar problem with keeping my archive tables small. The solution is to use partitioning and drop older partitions instead.

Maybe your client isn’t on 5.1 yet, but that’s going to be GA a lot sooner than 6.0.

You can see my article about pruning archive tables with partitions here: https://dev.mysql.com/tech-resources/articles/partitioning-event_scheduler.html

Reply

Tangential:
“If MySQL encounters any errors in a multiple-table rename, it does a reverse rename for all renamed tables to return everything to its original state. ”

I notice that earlier this week and have been curious about what happens if the reverse rename fails. I bet it is nothing good. I’ll write up a test case over the weekend…

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *