Split brain solution using pg_rewind in PostgreSQL

Ishteyaque

October 4, 2022

Tags: Repmgr, Pg_Rewind, Technical Track, Auto Failover, Failover In Postgresql, Split Brain In Postgresql, Postgresql With Repmgr Integration, Repmgrd, Postgresql

Summary

In this blog I will explain pg_rewind use case to solve split brain after a database failover happens.

Split Brain Syndrome

Split Brain is often used to describe the scenario when two or more nodes in a cluster, lose connectivity with one another but then continue to operate independently of each other.

Streaming replication

Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real time, so that the databases of the primary server and standby server can be kept in sync.

Repmgr Extension

repmgr is an open-source toolset from 2ndQuadrant, a leading specialist in PostgreSQL-related technologies and services. The product is used to automate, enhance, and manage PostgreSQL streaming replication.

repmgrd (daemon)

repmgrd is a daemon that runs on each PostgreSQL node, monitoring the local node, and (unless it’s the primary node) the upstream server (the primary server or with cascading replication, another standby) which it’s connected to repmgrd can be configured to provide failover capability in case the primary upstream node becomes unreachable, and/or provide monitoring data to the repmgr meta database.

Database Architecture

I configured two nodes PostgreSQL database with streaming replication technique and installed repmgr tool to achieve auto failover in PostgreSQL. After that started repmgrd daemon process.

[postgres@fahad archive]$ repmgr daemon status
 ID | Name                 | Role    | Status    | Upstream             | repmgrd | PID   | Paused? | Upstream last seen
----+----------------------+---------+-----------+----------------------+---------+-------+---------+--------------------
 1  | athar.postgres-1.com | primary | * running |                      | running | 60943 | no      | n/a
 2  | fahad.postgres-2.com | standby |   running | athar.postgres-1.com | running | 70493 | no      | 2 second(s) ago

Scenario:

I forcefully killed the postmaster process on the primary database to do the failover. After failover, the old primary was stopped, and the other node started working as primary. Then I restarted the old primary manually and found that both nodes’ roles were primary.

This is one example of how a split-brain syndrome occurs in PostgreSQL.

[postgres@fahad ~]$ repmgr cluster show --verbose
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
INFO: connecting to database
 ID | Name                 | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                       
----+----------------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
 1  | athar.postgres-1.com | primary | ! running |          | default  | 100      | 1        | host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2
 2  | fahad.postgres-2.com | primary | * running |          | default  | 100      | 2        | host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2

WARNING: following issues were detected
  - node "athar.postgres-1.com" (ID: 1) is running but the repmgr node record is inactive

[postgres@athar ~]$ repmgr daemon status
 ID | Name                 | Role    | Status               | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+----------------------+---------+----------------------+----------+---------+------+---------+--------------------
 1  | athar.postgres-1.com | primary | * running            |          | running | 5084 | no      | n/a
 2  | fahad.postgres-2.com | standby | ! running as primary |          | running | 2267 | no      | n/a

WARNING: following issues were detected
  - node "fahad.postgres-2.com" (ID: 2) is registered as standby but running as primary

Solution:

As mentioned above, we have seen a split brain between clusters. Both nodes work as primary and independent.

Here we need the old primary to rejoin the replica configuration and rewind that database to sync the log sequence from the current primary. This will discard any transactions that could have arrived at the old primary after the automatic switchover happened and was started. Before rewind, just stop all database services on node1.

rewind dry run to verify all pre-requisites are met

[postgres@athar ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres' --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind --dry-run
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
INFO: replication slots in use, 2 free slots on node 20
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6967965196868380905
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028
INFO: prerequisites for using pg_rewind are met
INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" created
WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping
INFO: 0 files would have been copied to "/tmp/repmgr-config-archive-athar.postgres-1.com"
INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

Run rewind

[postgres@athar ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres'  --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028
INFO: prerequisites for using pg_rewind are met
WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping
INFO: 0 files copied to "/tmp/repmgr-config-archive-athar.postgres-1.com"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2'"
NOTICE: 0 files copied to /var/lib/pgsql/12/data
INFO: directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted
INFO: creating replication slot as user "postgres"
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "pg_ctl  -w -D '/var/lib/pgsql/12/data' start"
INFO: node "athar.postgres-1.com" (ID: 1) is pingable
INFO: node "athar.postgres-1.com" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

Cluster Status after rewind

[postgres@athar ~]$ repmgr daemon status
 ID | Name                 | Role    | Status    | Upstream             | repmgrd | PID  | Paused? | Upstream last seen
----+----------------------+---------+-----------+----------------------+---------+------+---------+--------------------
 1  | athar.postgres-1.com | standby |   running | fahad.postgres-2.com | running | 6345 | no      | 0 second(s) ago
 2  | fahad.postgres-2.com | primary | * running |                      | running | 5184 | no      | n/a

Insight and analysis of technology and business strategy