Or, When MySQL Lies!
When I do a
show slave status\G, sometimes mysqld will lie to me and give me a wrong
Exec_Master_Log_Pos. Let me explain with a situation from last night.
This is the output of
show slave status\G from mysql version 5.0.41-community-log:
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: XXX.XXX.XXX.XXX Master_User: replic_username Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000480 Read_Master_Log_Pos: 690470773 Relay_Log_File: db2-relay-bin.000028 Relay_Log_Pos: 683977007 Relay_Master_Log_File: mysql-bin.000480 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Skip_Counter: 0 Exec_Master_Log_Pos: 126 Relay_Log_Space: 690471192 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL
So in summary, the slave SQL thread is stuck (in this case because of a problem during the transfer of the binlog data to the slave’s relay log). The
show slave status\G command tells me that it is stuck at the master binlog file
But, if I look at the error log file entries when the slave got stuck I see:
080206 16:03:48 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013) 080206 16:03:48 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000480' position 683976897 080206 16:03:48 [Note] Slave: connected to master '[email protected]:3306',replication resumed in log 'mysql-bin.000480' at position 683976897 080206 16:04:31 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 1966107, event_type: 16 080206 16:04:31 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error 080206 16:04:31 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0 080206 16:04:31 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000480' position 126
I’ll compare both outputs.
show slave status tells me that I’m stuck at:
Relay_Master_Log_File: mysql-bin.000480 Exec_Master_Log_Pos: 126
The error log file says at first that it is stuck at:
'mysql-bin.000480' position 683976897
then later in the error log is says it is stuck at:
'mysql-bin.000480' position 126
So obviously, the last information given in the error log as well as in the
show slave status command is wrong. If I had to choose, I would restart the slave at log
683976897 using the command:
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000480', MASTER_LOG_POS=683976897;. If not, I’ll be going to try to re-run a full binlog that has been already applied.
So, when a slave gets stuck on relay log corruption, my own general rules are as follows:
- If the position is less then, say, 500, suspect that that the position is wrong (it might be right but it needs to be checked out).
- Always go in the error log to see what it has to say about position.
- If the position in the error log as well as in the
show slave statusagree, more-or-less (both are a big number), choose the one from
show slave status. Otherwise, pick the one in the error log.
NOTE: Please understand that it is possible for the slave to be stuck at a small position. This would happen if the master did change the binlog and the slave got stuck just after the switch of the master binlog. So be careful.
There are three sets of file/position coordinates in SHOW SLAVE STATUS:
1) The position, ON THE MASTER, from which the I/O thread is reading: Master_Log_File/Read_Master_Log_Pos.
2) The position, IN THE RELAY LOGS, at which the SQL thread is executing: Relay_Log_File/Relay_Log_Pos
3) The position, ON THE MASTER, at which the SQL thread is executing: Relay_Master_Log_File/Exec_Master_Log_Pos
You are mixing and matching from different sets, and you can’t do that. In many cases they’ll have the same thing and it doesn’t matter. But not in the general case.
Thanks Xaprb! I had highlighted the wrong line. It has been fixed.
Thanks again for catching it!
What if “exec_master_log_pos” value is more than “read_master_log_pos” value? The situation is that replication is broken because of a missing table at slave end that needs to be updated.
What does this mean?