Is slave_exec_mode idempotent actually idempotent?

Pythian Marketing

October 17, 2018

Tags: Google Cloud Platform (Gcp), Business Insights, Application Migration

Co-Author: Rafael Alvarez When you look up the word idempotence in wikipedia you get the following definition: Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. When I think of idempotence in the sense of how IT operations work, the very first thing that comes to mind is Ansible. In Ansible you have a yml file that represents a state you want the target host to be in. For example, I can use the yum module to ensure that MySQL is installed on my RHEL virtual machine. When the playbook is run, it checks if MySQL is installed. If not, it will install it. It is installed, then it will ignore that task in the playbook and move on. The point is that we’re instructing Ansible to ensure that the target host is in the desired state by the time the end of the playbook is reached. Another thing that we may think of as open source DBAs is the MySQL system server variable slave_exec_mode. This variable allows for two values, strict and idempotent. When you run into issues with a duplicate key or absent key errors in replication, strict mode will cause replication to fail until a user has corrected the issue with the data. When you use idempotent mode you would figure that replication would work in an idempotent fashion and ensure that the records reach the desired state. Now I know that this wouldn’t be possible using statement, or mixed based binary logging, and you would also need to ensure that you’re using a full binlog row image, thus limiting the scope of application, but you would figure that when something says it’s operating in an idempotent fashion, it would do so. Here’s an example. I have a 5.7 MySQL installation with a single master and a single slave configured with standard asynchronous replication. I have created a table and have inserted a couple of rows on the master. You’ll also note that binary logging is set to row and I am using a full binlog row image.

mysql> show global variables like 'binlog%';
 +--------------------------------------------+--------------+
 | Variable_name | Value |
 +--------------------------------------------+--------------+
 ....
 | binlog_format | ROW |
 ....
 | binlog_row_image | FULL |
 ....
 +--------------------------------------------+--------------+
 15 rows in set (0.00 sec)
 
 mysql> show create table t1\G
 *************************** 1. row ***************************
  Table: t1
 Create Table: CREATE TABLE `t1` (
  `c1` int(11) NOT NULL,
  `c2` int(11) DEFAULT NULL,
  PRIMARY KEY (`c1`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1
 1 row in set (0.00 sec)
 
 mysql> insert into t1 values (1, 2);
 Query OK, 1 row affected (0.00 sec)
 
 mysql> insert into t1 values (2, 3);
 Query OK, 1 row affected (0.00 sec)

On my slave, I have slave exec mode set to strict. After stopping replication, I deleted one of the two rows that were created on the master and replicated down to the slave.

mysql> show global variables like 'slave_ex%';
 +-----------------+--------+
 | Variable_name | Value |
 +-----------------+--------+
 | slave_exec_mode | STRICT |
 +-----------------+--------+
 1 row in set (0.02 sec)
 
 mysql> select * from t1;
 +----+------+
 | c1 | c2 |
 +----+------+
 | 1 | 2 |
 | 2 | 3 |
 +----+------+
 2 rows in set (0.00 sec)
 
 mysql> stop slave;
 Query OK, 0 rows affected (0.14 sec)
 
 mysql> delete from t1 where c1 = 2;
 Query OK, 1 row affected (0.12 sec)
 
 mysql> select * from t1;
 +----+------+
 | c1 | c2 |
 +----+------+
 | 1 | 2 |
 +----+------+
 1 row in set (0.00 sec)

Once this was done, I went back to the master and updated the record that was deleted on the slave. You can see in the binary log that the update is logged and the full row image is available.

mysql> update t1 set c2 = 4 where c1 = 2;
 Query OK, 1 row affected (0.07 sec)
 Rows matched: 1 Changed: 1 Warnings: 0
 
 mysql> select * from t1;
 +----+------+
 | c1 | c2 |
 +----+------+
 | 1 | 2 |
 | 2 | 4 |
 +----+------+
 2 rows in set (0.00 sec)
 
 [root@centos7-1 ~]# mysqlbinlog -vv /var/lib/mysql/centos7-1-bin.000003
 ....
 # at 1328
 #181012 8:27:44 server id 1 end_log_pos 1396 Query thread_id=9 exec_time=0 error_code=0
 SET TIMESTAMP=1539347264/*!*/;
 BEGIN
 /*!*/;
 # at 1396
 #181012 8:27:44 server id 1 end_log_pos 1438 Table_map: `idem`.`t1` mapped to number 108
 # at 1438
 #181012 8:27:44 server id 1 end_log_pos 1488 Update_rows: table id 108 flags: STMT_END_F
 
 BINLOG '
 QJPAWxMBAAAAKgAAAJ4FAAAAAGwAAAAAAAEABGlkZW0AAnQxAAIDAwAC
 QJPAWx8BAAAAMgAAANAFAAAAAGwAAAAAAAEAAgAC///8AgAAAAMAAAD8AgAAAAQAAAA=
 '/*!*/;
 ### UPDATE `idem`.`t1`
 ### WHERE
 ### @1=2 /* INT meta=0 nullable=0 is_null=0 */
 ### @2=3 /* INT meta=0 nullable=1 is_null=0 */
 ### SET
 ### @1=2 /* INT meta=0 nullable=0 is_null=0 */
 ### @2=4 /* INT meta=0 nullable=1 is_null=0 */
 # at 1488
 #181012 8:27:44 server id 1 end_log_pos 1515 Xid = 498
 COMMIT/*!*/;

Going back to the slave, I have left slave exec mode as strict and started replication. As expected, it failed. I then switched slave exec mode to idempotent and started replication again before checking the table and seeing that it was unchanged. The updated record on the master was not present on the slave.

mysql> start slave;
 Query OK, 0 rows affected (0.09 sec)
 
 mysql> show slave status \G
 *************************** 1. row ***************************
 .....
  Last_Error: Could not execute Update_rows event on table idem.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log centos7-1-bin.000003, end_log_pos 1488
 .....
 
 mysql> stop slave;
 Query OK, 0 rows affected (0.11 sec)
 
 mysql> set global slave_exec_mode = 'idempotent';
 Query OK, 0 rows affected (0.00 sec)
 
 mysql> start slave;
 Query OK, 0 rows affected (0.12 sec)
 
 mysql> show slave status \G
 *************************** 1. row ***************************
  Slave_IO_State: Waiting for master to send event
 .....
  Slave_IO_Running: Yes
  Slave_SQL_Running: Yes
 .....
  Seconds_Behind_Master: 0
 .....
  Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
 .....
 1 row in set (0.00 sec)
 
 mysql> select * from t1;
 +----+------+
 | c1 | c2 |
 +----+------+
 | 1 | 2 |
 +----+------+
 1 row in set (0.00 sec)

Going back to the Ansible state-minded definition of idempotency, I think it could be argued that this is not idempotent. If the idempotency in MySQL was representative of state in the same way that Ansible is, you would figure that the following rules would apply: Insert: If you get an insert violation on a unique key, update the record on the slave so that it matches the state based on the full row image. Update: If you get an update error due to the record not being in the table, insert the record so that it matches the state based on the full row image. Delete: If you get a delete error stating that the target record cannot be found in the table, do nothing and move on to the next statement. However, you can see in the example above that this is not the case. I do realize that an argument could be made about the state of the slave’s table not changing as the failed statement can be “applied multiple times without changing the result beyond the initial application,” but I think we do need to be aware of what people consider when they think of the term ‘idempotent’ and make sure that we have the right expectations when using an idempotent mode in MySQL. One last thing to consider is if we should be using this mode. The most common application of this mode that I’ve heard about is people trying to get past replication issues, much like you would do with pt-slave-restart. Get the failure, set to idempotent, let replication run until it’s caught up, set it back to strict, then hope for the best. I would suggest that first and foremost, you should use a combination of mysqlbinlog on the master and MySQL on the slave to correct data issues, if possible, in order to ensure that strict mode can be used. But if skipping transaction is unavoidable and absolutely necessary, I would recommend using pt-slave-restart. The reason being that with pt-slave-restart you can log the replication failures, which will provide you a list of tables that may have been impacted, which increases your odds of being able to fix the replication issues further down the line with single table reseeds or a targeted use of pt-table-sync. If you use the idempotent slave exec mode you don’t get these logs.

Conclusion

I will acknowledge that it’s debatable as to whether ‘idempotent’ was the correct word to use for the slave exec mode that skips key violations, but I think it’s important that we understand what this mode actually does in order to become better operational DBAs. We also need to consider what we may lose when using idempotent mode when it comes to logging, which may be important when trying to resolve replication issues with larger data sets.

Insight and analysis of technology and business strategy

Is slave_exec_mode idempotent actually idempotent?

Conclusion

Pythian Blogs

Tell us how we can help!