Split brain solution using pg_rewind in PostgreSQL

Posted in: Technical Track
Split brain solution using pg_rewind in PostgreSQL
Summary

In this blog I will explain pg_rewind use case to solve split brain after a database failover happens.

Split Brain Syndrome

Split Brain is often used to describe the scenario when two or more nodes in a cluster, lose connectivity with one another but then continue to operate independently of each other.

Streaming replication

Streaming replication, a standard feature of PostgreSQL, allows the updated information on the primary server to be transferred to the standby server in real time, so that the databases of the primary server and standby server can be kept in sync.

Repmgr Extension

repmgr is an open-source toolset from 2ndQuadrant, a leading specialist in PostgreSQL-related technologies and services. The product is used to automate, enhance, and manage PostgreSQL streaming replication.

repmgrd (daemon)

repmgrd is a daemon that runs on each PostgreSQL node, monitoring the local node, and (unless it’s the primary node) the upstream server (the primary server or with cascading replication, another standby) which it’s connected to repmgrd can be configured to provide failover capability in case the primary upstream node becomes unreachable, and/or provide monitoring data to the repmgr meta database.

Database Architecture

I configured two nodes PostgreSQL database with streaming replication technique and installed repmgr tool to achieve auto failover in PostgreSQL. After that started repmgrd daemon process.

 

[[email protected] archive]$ repmgr daemon status
 ID | Name                 | Role    | Status    | Upstream             | repmgrd | PID   | Paused? | Upstream last seen
----+----------------------+---------+-----------+----------------------+---------+-------+---------+--------------------
 1  | athar.postgres-1.com | primary | * running |                      | running | 60943 | no      | n/a
 2  | fahad.postgres-2.com | standby |   running | athar.postgres-1.com | running | 70493 | no      | 2 second(s) ago

 

Scenario:

I forcefully killed the postmaster process on the primary database to do the failover. After failover, the old primary was stopped, and the other node started working as primary. Then I restarted the old primary manually and found that both nodes’ roles were primary.

This is one example of how a split-brain syndrome occurs in PostgreSQL.

[[email protected] ~]$ repmgr cluster show --verbose
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
INFO: connecting to database
 ID | Name                 | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                       
----+----------------------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------
 1  | athar.postgres-1.com | primary | ! running |          | default  | 100      | 1        | host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2
 2  | fahad.postgres-2.com | primary | * running |          | default  | 100      | 2        | host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2

WARNING: following issues were detected
  - node "athar.postgres-1.com" (ID: 1) is running but the repmgr node record is inactive

[[email protected] ~]$ repmgr daemon status
 ID | Name                 | Role    | Status               | Upstream | repmgrd | PID  | Paused? | Upstream last seen
----+----------------------+---------+----------------------+----------+---------+------+---------+--------------------
 1  | athar.postgres-1.com | primary | * running            |          | running | 5084 | no      | n/a
 2  | fahad.postgres-2.com | standby | ! running as primary |          | running | 2267 | no      | n/a

WARNING: following issues were detected
  - node "fahad.postgres-2.com" (ID: 2) is registered as standby but running as primary
 
Solution:

As mentioned above, we have seen a split brain between clusters. Both nodes work as primary and independent.

Here we need the old primary to rejoin the replica configuration and rewind that database to sync the log sequence from the current primary. This will discard any transactions that could have arrived at the old primary after the automatic switchover happened and was started. Before rewind, just stop all database services on node1.

rewind dry run to verify all pre-requisites are met
[[email protected] ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres' --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind --dry-run
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
INFO: replication slots in use, 2 free slots on node 20
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6967965196868380905
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028
INFO: prerequisites for using pg_rewind are met
INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" created
WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping
INFO: 0 files would have been copied to "/tmp/repmgr-config-archive-athar.postgres-1.com"
INFO: temporary archive directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
  pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2'
INFO: prerequisites for executing NODE REJOIN are met

 

Run rewind
[[email protected] ~]$ repmgr node rejoin -d 'host=192.168.202.229 dbname=postgres user=postgres'  --config-files=/var/lib/pgsql/12/data/postgresql.conf --verbose --force-rewind
NOTICE: using provided configuration file "/var/lib/pgsql/repmgr.conf"
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/A000028
INFO: prerequisites for using pg_rewind are met
WARNING: specified file "/var/lib/pgsql/12/data//var/lib/pgsql/12/data/postgresql.conf" not found, skipping
INFO: 0 files copied to "/tmp/repmgr-config-archive-athar.postgres-1.com"
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.202.229 user=postgres dbname=postgres connect_timeout=2'"
NOTICE: 0 files copied to /var/lib/pgsql/12/data
INFO: directory "/tmp/repmgr-config-archive-athar.postgres-1.com" deleted
INFO: creating replication slot as user "postgres"
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.202.228 user=postgres dbname=postgres connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "pg_ctl  -w -D '/var/lib/pgsql/12/data' start"
INFO: node "athar.postgres-1.com" (ID: 1) is pingable
INFO: node "athar.postgres-1.com" (ID: 1) has attached to its upstream node
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2
 
Cluster Status after rewind
[[email protected] ~]$ repmgr daemon status
 ID | Name                 | Role    | Status    | Upstream             | repmgrd | PID  | Paused? | Upstream last seen
----+----------------------+---------+-----------+----------------------+---------+------+---------+--------------------
 1  | athar.postgres-1.com | standby |   running | fahad.postgres-2.com | running | 6345 | no      | 0 second(s) ago
 2  | fahad.postgres-2.com | primary | * running |                      | running | 5184 | no      | n/a

 

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Database Consultant
Md Athar Ishteyaque is certified EDB PostgreSQL & Oracle Database Architecture / Administration with over 8 years of experience in database design & management, Performance-tuning, database backup & recovery and extending support to technical teams for optimizing application performance with good knowledge. Extensive experience in installation, Configuration, Database Replication , clustering, database migration, backup and recovery. Insightful understanding of applying Patches to Standalone, DR & RAC Environment, Migration activities from standalone to RAC and vice versa. Excellent communication, collaboration & team building skills with proficiency at grasping new technical concepts quickly and utilizing the same in a productive manner

1 Comment. Leave new

Thank you!
Your article save my time :)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *