Beware Starting Slaves in the Position in the master.info file

Posted in: Technical Track

I’ve seen many a good DBA make the master of starting slaves from the position in the master.info file, most recently this week, that I want to bring it to everyone’s attention. Of course I mean the underlying issue and not the names of the DBA because that would be cruel.

In the typical scenario where this is an issue, the sequence of events is roughly the same with some small variation. A cold backup or a snapshot is restored onto a new server to build out a new slave. The binary log position from the master.info file, which is part of the backup, is used to start replication. Eventually after a short while, someone notices data discrepancies on the new slave compared to the master or replication stops due to an error.

The problem can be best looked by looking the slave status output in MySQL like below:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: msandbox
                  Master_Port: 26768
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 4723
               Relay_Log_File: mysql_sandbox26769-relay-bin.000002
                Relay_Log_Pos: 874
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 729
              Relay_Log_Space: 1042
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

mysql>

There are two sets of coordinates for replication in consideration here, the Execute Master Position and the Read Master Position. The Execute Master Position, denoted by the ‘Relay_Master_Log_File’ and Exec_Master_Log_Pos’ in the output above, is the position in the master’s binary log of the current statement being executed by the slave from the relay log. The Read Master Position, denoted by the ‘Master_Log_File’ and Exec_Master_Log_Pos’ in the output above, is the position in the master’s binary logs which the slave is reading from and writing to the relay log. So if the slave is lagged for some reason, which can happen often in the MySQL world, the Read Master Position will be different and in fact newer then the Execute Master Position. On the file system, the Read Master Position is stored in the master.info file and the Execute Master Position is stored instead in the relay-log.info file.

So if the slave was lagged when it was shut down for the cold backup or when the snapshot was taken, then the two positions would be different. Consider the following example from a server which was lagged behind.

The master.info file shows

--(Wed:20110112:1527)-(0:$)-- more master.info
15
mysql-bin.000001
4723
127.0.0.1
msandbox
msandbox
26768
60
0

0
--([email protected])-(/home/wang/sandboxes/rsandbox_5_1_53/node1/data)--

The relay-log.info file shows

--(Wed:20110112:1527)-(0:$)-- more relay-log.info
./mysql_sandbox26769-relay-bin.000002
874
mysql-bin.000001
729
--([email protected])-(/home/wang/sandboxes/rsandbox_5_1_53/node1/data)--

When the replication is started on the new server from the position in the master.info file, the statements in the binary logs between the Execute Master Position and the Read Master Position get skipped and cause data corruption.

In the end, the solution is to always start a new slave using the position in the relay-log.info file.

email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

3 Comments. Leave new

Why not get Master_Log_File and Exec_Master_Log_Pos from `show slave status` ?

Never try to get information from a file.

Reply

You could get it from ‘show slave status’. The only thing you must take care is to stop the slave, get the position, and then do the backup (snapshot or cold backup).

But when MySQL starts up, it reads it from both the master.info file and the relay-log.info file. Hence we need to parse it properly as DBAs.

Reply

So if the slave was not lagging before stopping it as we ensure that lag is 0, will it be still an issue to use master.info file?

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *