Point-In-Time-Restore using pg_basebackup on PostgreSQL
I had a conversation with another DBA about interview questions, and one interview topic that came up was using pg_basebackup to restore a database. I had a horrible realisation that I had not done a restore using pg_basebackup in PostgreSQL 15. With modern backup tools, using pg_basebackup is like using a manual screwdriver instead of an electrically-powered screwdriver; it gets the job done, but much more effort is involved.
However, sometimes pg_basebackup is the only tool available.
So, in this blog, we’ll look at PostgreSQL’s recovery options and their implications for restoring. We’ll also look at a simple restore using pg_basebackup when a user fails. I’ll be using a PG 15 database for these tests.
Recovery Options
There are several recovery options available. They include:
- recovery_target – The most commonly used option. If this parameter is set to “immediate, ” it’ll end as soon as the database cluster reaches a read-consistent state.
- recovery_target_lsn – Locates the LSN (Log Sequence Number) and will recover to that point. This can be influenced by the recovery_target_inclusive parameter. Any transactions up to that LSN number will be recovered if the parameter is set to on.
- recovery_target_time – The parameter specifies a time stamp to where the recovery process will stop. The point where it stops is finally determined by the recovery-target-inclusive parameter.
- recovery_target_xid – Recovers up to a specific XID.
- recovery_target_name – Uses a pre-named restore point from which to recover up to (the name was created by the
pg_create_restore_point()
function previously).
Note: These recovery options listed above are available from PG v12.
Here is the official documentation link PostgreSQL 15 Recovery Target Parameters.
In complex recovery situations, the option recovery_target_timeline can be used.
- recovery_target_timeline – Specifies recovering into a particular timeline. The value can be numeric or a special value like current (the same as the backup) or latest (the last timeline found in the archive). The option “recovery_target_timeline” is used in complex recovery scenarios and creates a new timeline. Effectively, it renames WAL files, so if a database is restored, but the original WAL files are needed, this option can create a divergent set of WAL files (or a divergent timeline).
Some other options are used in tandem with the recovery option specified.
- recovery_target_action – This parameter specifies what happens when the recovery target is reached. The options are [pause, promote and shutdown], with pause being the default. Pause is helpful as it lets a DBA check whether this is the recovery target. The pg_wal_replay_resume() function can end the recovery, i.e. open the database. Promote will enable the target database instance to accept connections, so it should not be used when restoring a STANDBY. SHUTDOWN is used if the database does not need to continue but is ready to accept new WAL files.
- recovery_target_inclusive – This parameter has two options, on and off. It works with parameters recovery_target_lsn, recovery_target_time, or recovery_target_xid. It determines where transactions will be in the recovery if they have reached the target LSN, commit time or transaction_id. The default is on. All transactions up to that point will be included if set to off.
Putting this all together, one option should be picked from:
- recovery_target, recovery_target_lsn, recovery_target_xid, recovery_target_name, recovery_target_timestamp
- and is influenced by recovery_target_action and recovery_target_inclusive.
The default is for the “recovery_target” to be set to immediate, so the restore will apply all WAL logs available.
Note: If the WAL files have been copied to another location, then the “restore_command” parameter may be needed to get them back. Of course, the WAL files can be copied manually.
However, for a PITR (Point-In-Time-Recovery), the other options probably are more useful. Personally, I prefer to use restore using the “recovery_target_time” option. LSN is also another good option, as it specifies a specific LSN. If you are monitoring DDL transactions, if something bad occurs, you can find the time or LSN corresponding to the issue and restore it to just before that point. Some data could be lost, so it is important to get it as near the time as possible.
Let’s look at an example that uses the pg_basebackup and the “recovery_target_time” option to do a PITR after a table was dropped.
Setting Up the Test
This is what my test instance looks like:
db01=# \l+ List of databases Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges | Size | Tablespace | Description ---------+----------+----------+-------------+-------------+------------+-----------------+-------------------+---------+------------+-------------------------------------------- db01 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | | libc | | 11 MB | pg_default | db02 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | | libc | | 16 MB | pg_default | pgbench | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | | libc | | 8133 kB | pg_default | postgres | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | | libc | | 7717 kB | pg_default | default administrative connection database
We are now going to create a table in the “db01” database:
db01=# CREATE TABLE public.t1 (id integer, curr_time timestamp, notes text); INSERT INTO public.t1 VALUES(generate_seri es (1,100000), now(), md5(random()::text)); CREATE TABLE INSERT 0 100000
Check the table for data:
db01=# SELECT count(*) FROM public.t1 ; count -------- 100000 db01=# SELECT * FROM public.t1 LIMIT 2; id | curr_time | notes ----+----------------------------+---------------------------------- 1 | 2023-02-23 11:29:54.006385 | 05c4145fe20236a21f6544a673dd2b36 2 | 2023-02-23 11:29:54.006385 | cf8ebec888adb01c8c409d76a9bbafad db01=# \l+ db01 List of databases Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | Access privileges | Size | Tablespace | Description -----+----------+----------+-------------+-------------+------------+-----------------+-------------------+--------+------------+------------- db01 | postgres | UTF8 | en_GB.UTF-8 | en_GB.UTF-8 | | libc | | 261 MB | pg_default |
We can see the size of the database “db01” has grown.
db01=# \dt+ public.* List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+----------+-------+----------+-------------+---------------+--------+------------- public | t1 | table | postgres | permanent | heap | 169 MB |
Taking the Backup
We can now take a backup of the database cluster using pg_basebackup (on the same host):
pg_basebackup -h localhost -p 5442 -U postgres -D /tmp/pg_data_backup -Fp -Xs -P [[email protected]: postgresql ] $: pg_basebackup -h localhost -p 5442 -U postgres -D /tmp/pg_data_backup -Fp -Xs -P -v pg_basebackup: initiating base backup, waiting for checkpoint to complete pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: D/94000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_830833" 316123/316123 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: D/94000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed
Options for pg_basebackup are:
- -Fp Format of the backup. Options are “p” for plain and “t” for tar. Plain copies the files in the same layout as the host server’s data directory and tablespaces.
- -Xs Method to be used for collecting WAL files. The “X” stands for method, and the “s” is for streaming. Other options include: “n” for none, i.e. don’t collect WAL files and “f” for fetch, which collects the WAL files after the backup has been completed.
- -P Show the progress being made.
- -D The target directory that the program writes its output to. This option is mandatory.
Here is the official documentation for pg_basebackup pg_basebackup documentation for PG 15.
We now drop the table “T1” accidentally. Oops!
db01=# DROP TABLE t1; DROP TABLE db01=# \dt+ public.* List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description -------+----------+-------+----------+-------------+----------------+--------+------------- public | t2 | table | postgres | permanent | heap | 81 MB | public | us_gaz | table | postgres | permanent | heap | 120 kB | public | us_lex | table | postgres | permanent | heap | 224 kB | public | us_rules | table | postgres | permanent | heap | 344 kB |
Restoring the Database to a Point-In-Time
So, now we have had to restore. As I have log_statements set to ‘DDL,’ I can check what time the table was dropped:
[[email protected]: postgresql ] $: grep -i "drop table t1" * postgresql-2023-02-23.log:2023-02-23 14:32:36.937 GMT [4074990] LOG: 00000: statement: drop table t1 postgresql-2023-02-23.log:2023-02-23 14:32:36.938 GMT [4074990] LOG: 00000: AUDIT: SESSION,8,1,DDL,DROP TABLE,TABLE,public.t1,"drop table t1
So, we have to set the “recovery_target_time” to 14:32, i.e. ‘2023-02-23 14:32:00’ # the time up to which the recovery will stop. I have set “recovery_target_inclusive” to on in this example.
Note: One could set the timestamp to one second before dropping the table. I used the nearest minute rather than the second, but recovering to ‘2023-02-23 14:32:36’ would have been more accurate.
Usually, we’d be restoring to another instance, but I’m restoring to the current PGDATA location to save time. Only do this on a test system, as it will delete all your live data.
So, the next steps are to: stop the instance, delete the existing PGDATA directory and restore it back using the cp command:
[[email protected]: tmp ] $: sudo systemctl stop postgresql-15 [[email protected]: ~ ] $: rm -fr /var/lib/pgsql/15/data/* [[email protected]: ~ ] $: cd /var/lib/pgsql/15/data [[email protected]: ~ ] $: cp -r /tmp/pg_data_backup/* . [[email protected]: data ] $: vi postgresql.conf ## Change the restore_command to the location of the archived WAL files: 'cp /tmp/pg_backup/%f %p' ## Change recovery_target_time to the desired time: '2023-02-23 14:32:00'
Save the config file.
We can now restart the database using the SYSTEMD commands. The recovery will be applied automatically:
[[email protected]: /tmp ] $: sudo systemctl start postgresql-15 [sudo] password for user01: [[email protected]: tmp ] $: sudo systemctl status postgresql-15 ? postgresql-15.service - PostgreSQL 15 database server Loaded: loaded (/usr/lib/systemd/system/postgresql-15.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2023-02-23 16:21:02 GMT; 4s ago Docs: https://www.postgresql.org/docs/15/static/ Process: 4090124 ExecStartPre=/usr/pgsql-15/bin/postgresql-15-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS) Main PID: 4090129 (postmaster) Tasks: 8 (limit: 202680) Memory: 124.9M CPU: 61ms CGroup: /system.slice/postgresql-15.service ??4090129 /usr/pgsql-15/bin/postmaster -D /var/lib/pgsql/15/data/ ??4090130 "postgres: logger " ??4090131 "postgres: checkpointer " ??4090132 "postgres: background writer " ??4090134 "postgres: walwriter " ??4090135 "postgres: autovacuum launcher " ??4090136 "postgres: archiver " ??4090137 "postgres: logical replication launcher " Feb 23 16:21:02 pg-server-db01 systemd[1]: Starting PostgreSQL 15 database server... Feb 23 16:21:02 pg-server-db01 postmaster[4090129]: 2023-02-23 16:21:02.409 GMT [4090129] LOG: 00000: pgaudit extension initialized Feb 23 16:21:02 pg-server-db01 postmaster[4090129]: 2023-02-23 16:21:02.409 GMT [4090129] LOCATION: _PG_init, pgaudit.c:2202 Feb 23 16:21:02 pg-server-db01 postmaster[4090129]: 2023-02-23 16:21:02.437 GMT [4090129] LOG: 00000: redirecting log output to logging collector process Feb 23 16:21:02 pg-server-db01 postmaster[4090129]: 2023-02-23 16:21:02.437 GMT [4090129] HINT: Future log output will appear in directory "/var/log/postgresql". Feb 23 16:21:02 pg-server-db01 postmaster[4090129]: 2023-02-23 16:21:02.437 GMT [4090129] LOCATION: SysLogger_Start, syslogger.c:712 Feb 23 16:21:02 pg-server-db01 systemd[1]: Started PostgreSQL 15 database server.
We can see in the logs what happened at the restart time:
2023-02-23 16:09:20.387 GMT [4088043] LOG: 00000: database system is shut down 2023-02-23 16:09:20.387 GMT [4088043] LOCATION: UnlinkLockFiles, miscinit.c:977 2023-02-23 16:21:02.437 GMT [4090129] LOG: 00000: starting PostgreSQL 15.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2), 64-bit 2023-02-23 16:21:02.437 GMT [4090129] LOCATION: PostmasterMain, postmaster.c:1189 2023-02-23 16:21:02.438 GMT [4090129] LOG: 00000: listening on IPv4 address "0.0.0.0", port 5442 2023-02-23 16:21:02.438 GMT [4090129] LOCATION: StreamServerPort, pqcomm.c:582 2023-02-23 16:21:02.438 GMT [4090129] LOG: 00000: listening on IPv6 address "::", port 5442 2023-02-23 16:21:02.438 GMT [4090129] LOCATION: StreamServerPort, pqcomm.c:582 2023-02-23 16:21:02.439 GMT [4090129] LOG: 00000: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5442" 2023-02-23 16:21:02.439 GMT [4090129] LOCATION: StreamServerPort, pqcomm.c:577 2023-02-23 16:21:02.442 GMT [4090129] LOG: 00000: listening on Unix socket "/tmp/.s.PGSQL.5442" 2023-02-23 16:21:02.442 GMT [4090129] LOCATION: StreamServerPort, pqcomm.c:577 2023-02-23 16:21:02.444 GMT [4090133] LOG: 00000: database system was interrupted; last known up at 2023-02-23 16:07:42 GMT 2023-02-23 16:21:02.444 GMT [4090133] LOCATION: StartupXLOG, xlog.c:4962 2023-02-23 16:21:02.492 GMT [4090133] LOG: 00000: redo starts at D/92000028 2023-02-23 16:21:02.492 GMT [4090133] LOCATION: PerformWalRecovery, xlogrecovery.c:1640 2023-02-23 16:21:02.493 GMT [4090133] LOG: 00000: consistent recovery state reached at D/92000100 2023-02-23 16:21:02.493 GMT [4090133] LOCATION: CheckRecoveryConsistency, xlogrecovery.c:2122 2023-02-23 16:21:02.493 GMT [4090133] LOG: 00000: redo done at D/92000100 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-02-23 16:21:02.493 GMT [4090133] LOCATION: PerformWalRecovery, xlogrecovery.c:1777 2023-02-23 16:21:02.556 GMT [4090131] LOG: 00000: checkpoint starting: end-of-recovery immediate wait 2023-02-23 16:21:02.556 GMT [4090131] LOCATION: LogCheckpointStart, xlog.c:6089 2023-02-23 16:21:02.575 GMT [4090131] LOG: 00000: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.004 s, sync=0.006 s, total=0.023 s; sync files=2, longest=0.005 s, average=0.003 s; distance=16384 kB, estimate=16384 kB 2023-02-23 16:21:02.575 GMT [4090131] LOCATION: LogCheckpointEnd, xlog.c:6170 2023-02-23 16:21:02.579 GMT [4090129] LOG: 00000: database system is ready to accept connections
The key lines (13-16) in the logs are highlighted. This tells us the recovery was started at “D/92000028,” and the target was reached at “D/92000028“. There was little to do (recovery wise), as the backup was taken only moments before the target time.
The database can now accept connections:
[[email protected]: ~ ] $: psql -d db01 -p 5442 psql (15.2) Type "help" for help. db01=# \dt+ public.* List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description -------+------+-------+----------+-------------+---------------+--------+------------- public | t1 | table | postgres | permanent | heap | 169 MB |
And, as if by magic, the table’s back again.
db01=# SELECT count(*) FROM public.t1 ; count -------- 100000 db01=# SELECT * FROM public.t1 LIMIT 2; id | curr_time | notes ---+----------------------------+---------------------------------- 1 | 2023-02-23 11:29:54.006385 | 05c4145fe20236a21f6544a673dd2b36 2 | 2023-02-23 11:29:54.006385 | cf8ebec888adb01c8c409d76a9bbafad
Conclusion
As with most things in PostgreSQL, most work is invisible to the user or DBA. Restoring an instance is extremely easy, but care must be taken as usual. I’d never recommend resorting to recovery on the same server, which can cause too many problems. PITR is vital for a PG DBA, as we don’t have flashbacks like some database vendors. So, this is the only way to retrieve lost data unless you have a standby with a suitable delay.
However, why use pg_basebackup for performing backups? Personally, I wouldn’t if I had the choice. There are many better options from Barman, pgBackRest, WAL–G and others. The main reason why pg_basebackup is not a good option is not that it does not work. It works and works well. However, it does not maintain a catalogue or let you do partial backups. It is a basic tool that works for single backups. So, use it for single backups, NOT as a production backup tool.
3 Comments. Leave new
Thank you for this helpful article. I’ve just one question: Is it also possible to use this method for a server migration with one big database( ~4TB) with the same PSQL version?
Regards Ano
Hi Ano,
The answer is yes, you can use pg_basebackup to do a migration. It is especially easy if your new server can access the old one. I would say that the database cluster should be turned off.
However, what I would recommend is this (as long as you are migrating between the same PG versions). Use pg_basebackup to create a streaming replication service and then you can chose when to stop the primary and promote the standby.
This offers more reliance and less downtime. It is not complex to setup. If you need more help, please let us know.
Kind regards,
Matt
Note: If you want to migrate between different PG database versions (say 11 to 14), then streaming replication won’t work. For jumping versions, logical replication is needed.