Recently we faced an issue where the config management software had automatically upgraded the mariadb-server-10.1 package to the latest 10.1.31 version. This upgrade broke the galera cluster setup for this installation.
I’ve started to recreate this issue in my local lab setup and I managed to reproduce this problem.
I have created a 3 node galera setup: galera1 (192.168.55.100), galera2 (192.168.55.101) and galera3 (192.168.55.102). All 3 servers run MariaDB-10.1.30. Galera replication is working fine.
This is the basic galera config:
# cat /etc/mysql/conf.d/cluster.cnf ######################################################### # Galera config ######################################################### [mysqld] wsrep_on = ON wsrep_provider = /usr/lib/libgalera_smm.so wsrep_provider_options = wsrep_cluster_name = pxc_bootstrap wsrep_cluster_address = gcomm://192.168.55.100,192.168.55.101,192.168.55.102 wsrep_node_address = 192.168.55.101 wsrep_log_conflicts = 1 wsrep_sst_method = xtrabackup-v2 wsrep_sst_auth = sstuser:sstpass # Galera overrides binlog_format = ROW innodb_autoinc_lock_mode = 2
When I upgrade the galera2 node to 10.1.31 it will not rejoin the cluster:
2018-02-19 18:07:09 140541460941568 [Note] WSREP: State transfer required: Group state: ba08f7ac-1589-11e8-8944-27e143bb408f:285676 Local state: ba08f7ac-1589-11e8-8944-27e143bb408f:273759 2018-02-19 18:07:09 140541460941568 [Note] WSREP: New cluster view: global state: ba08f7ac-1589-11e8-8944-27e143bb408f:285676, view# 33: Primary, number of nodes: 3, my index: 0, protocol version 3 2018-02-19 18:07:09 140541460941568 [Warning] WSREP: Gap in state sequence. Need state transfer. 2018-02-19 18:07:09 140541165565696 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.55.101' --datadir '/var/lib/mysql/' --parent '4754' '' ' WSREP_SST: [INFO] Streaming with xbstream (20180219 18:07:09.327) WSREP_SST: [INFO] Using socat as streamer (20180219 18:07:09.329) WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20180219 18:07:09.332) WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20180219 18:07:09.354) 2018-02-19 18:07:11 140541190731520 [Note] WSREP: (b2ffafb5, 'tcp://0.0.0.0:4567') connection to peer b2ffafb5 with addr tcp://192.168.55.101:4567 timed out, no messages seen in PT3S 2018-02-19 18:07:11 140541190731520 [Note] WSREP: (b2ffafb5, 'tcp://0.0.0.0:4567') turning message relay requesting off WSREP_SST: [ERROR] Possible timeout in receving first data from donor in gtid stage (20180219 18:08:49.362) WSREP_SST: [ERROR] Cleanup after exit with status:32 (20180219 18:08:49.364) 2018-02-19 18:08:49 140541165565696 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.55.101' --datadir '/var/lib/mysql/' --parent '4754' '' Read: '(null)' 2018-02-19 18:08:49 140541165565696 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '192.168.55.101' --datadir '/var/lib/mysql/' --parent '4754' '' : 32 (Broken pipe) 2018-02-19 18:08:49 140541460941568 [ERROR] WSREP: Failed to prepare for 'xtrabackup-v2' SST. Unrecoverable. 2018-02-19 18:08:49 140541460941568 [ERROR] Aborting
The xtrabackup-script on the donor side is never executed:
2018-02-19 18:07:08 139902381455104 [Note] WSREP: Node b91cf7da state prim 2018-02-19 18:07:08 139902381455104 [Note] WSREP: view(view_id(PRIM,b2ffafb5,33) memb { b2ffafb5,0 b91cf7da,0 c9241678,0 } joined { } left { } partitioned { }) 2018-02-19 18:07:08 139902381455104 [Note] WSREP: save pc into disk 2018-02-19 18:07:08 139902373062400 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3 2018-02-19 18:07:08 139902373062400 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2018-02-19 18:07:09 139902373062400 [Note] WSREP: STATE EXCHANGE: sent state msg: b3995870-159f-11e8-8bdd-c675484f2591 2018-02-19 18:07:09 139902373062400 [Note] WSREP: STATE EXCHANGE: got state msg: b3995870-159f-11e8-8bdd-c675484f2591 from 0 (galera2) 2018-02-19 18:07:09 139902373062400 [Note] WSREP: STATE EXCHANGE: got state msg: b3995870-159f-11e8-8bdd-c675484f2591 from 1 (galera3) 2018-02-19 18:07:09 139902373062400 [Note] WSREP: STATE EXCHANGE: got state msg: b3995870-159f-11e8-8bdd-c675484f2591 from 2 (galera1) 2018-02-19 18:07:09 139902373062400 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 32, members = 2/3 (joined/total), act_id = 285676, last_appl. = 0, protocols = 0/7/3 (gcs/repl/appl), group UUID = ba08f7ac-1589-11e8-8944-27e143bb408f 2018-02-19 18:07:09 139902373062400 [Note] WSREP: Flow-control interval: [28, 28] 2018-02-19 18:07:09 139902373062400 [Note] WSREP: Trying to continue unpaused monitor 2018-02-19 18:07:09 139902653332224 [Note] WSREP: New cluster view: global state: ba08f7ac-1589-11e8-8944-27e143bb408f:285676, view# 33: Primary, number of nodes: 3, my index: 2, protocol version 3 2018-02-19 18:07:09 139902653332224 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-02-19 18:07:09 139902653332224 [Note] WSREP: REPL Protocols: 7 (3, 2) 2018-02-19 18:07:09 139902653332224 [Note] WSREP: Assign initial position for certification: 285676, protocol version: 3 2018-02-19 18:07:09 139902431176448 [Note] WSREP: Service thread queue flushed. 2018-02-19 18:07:11 139902381455104 [Note] WSREP: (c9241678, ‘tcp://0.0.0.0:4567’) turning message relay requesting off 2018-02-19 18:08:51 139902381455104 [Note] WSREP: declaring b91cf7da at tcp://192.168.55.102:4567 stable 2018-02-19 18:08:51 139902381455104 [Note] WSREP: forgetting b2ffafb5 (tcp://192.168.55.101:4567) 2018-02-19 18:08:51 139902381455104 [Note] WSREP: Node b91cf7da state prim 2018-02-19 18:08:51 139902381455104 [Note] WSREP: view(view_id(PRIM,b91cf7da,34) memb { b91cf7da,0 c9241678,0 } joined { } left { } partitioned { b2ffafb5,0 })
After some digging into the wsrep_sst_xtrabackup-v2 script I traced back the problem to the function wait_for_listen. This function was rewritten for MariaDB 10.1.31 to be able to add support for FreeBSD. This rewrite seems to have somehow broken it for Linux.
I have created a bug report in the MariaDB Jira but if you’re using MariaDB with Galera cluster, I suggest you wait a while before upgrading your installation to 10.1.31.
3 Comments. Leave new
You can download https://raw.githubusercontent.com/MariaDB/server/10.2/scripts/wsrep_sst_xtrabackup-v2.sh and substitute the script on /usr/bin/wsrep_sst_xtrabackup-v2
It will work again.
it seems to another good update for maria db which will be helpful for different aspects.
I am having the same problem when deploying xtrabackup-v2 in mariadb 10.2, and then I deploy with mariadb 10.4, I always get log “Resource Limits:” although my server have few