PG Backup & Restore

Backup & Recovery matters.

Failures can be divided into two categories: hardware/resource failures and software/human errors.

Replication aims for the former, while delayed replica & cold backups aim for the latter.

Pigsty has complete support for backups. There is battery-included physical replication that can be configured by simply declaring replica & delayed roles. And handy cold backup scripts for cold backup.

Physical Replica

In Pigsty, physical backups are created by specifying roles (pg_role) for the database instances. For example, the following configuration declares a HA database cluster with one primary & two replicas.

  1. pg-test:
  2. hosts:
  3. 10.10.10.11: { pg_seq: 1, pg_role: primary } # Primary
  4. 10.10.10.12: { pg_seq: 2, pg_role: replica } # Hot standby
  5. 10.10.10.13: { pg_seq: 3, pg_role: offline } # Warm standby
  6. vars:
  7. pg_cluster: pg-test

Hot Standby

replica = Hot Standby, which carries read-only traffic and maintains real-time synchronization with the primary, with a few replication delays.

It is consistent with the primary and will take over the work of the primary when it fails, and will also take over online read-only traffic. A hot standby that uses sync replication to keep up with the primary in real-time can also be called a sync backup. Under normal circumstances, the latency of physical replication can be in the range of 1ms-100ms / tens of KB to several MB, depending on the network conditions and load level.

Please refer to Classic Physical Replication.

Warm Standby

offline = Warm Standby, warm standby, does not carry online traffic. Backup, or for offline/analysis queries only.

Please refer to offline deployment.

Sync Standby

standby = Sync Standby. Strict real-time sync with the primary.

Use sync commit replica, also called sync standby. Please refer to sync standby deployment for details.

Delayed Replica

Delayed is a quick measure of software failure/human error. Changes are received in real-time from the primary using the standard primary-replica stream replication mechanism but are delayed for a specific period (e.g., one hour, a day) before the application is executed. Thus, it is a copy of the historical state of the original primary. When there is a problem like mistaken data deletion, the delay provides a time window to salvage: immediately query the data from the delayed and backfill the original primary.

A delayed replica can be created using the function standby cluster. For example, now you want to specify a delayed replica for the pg-test cluster: pg-testdelay, which is the state of pg-test 1 hour ago. If there is a mis-deletion of data, it can be immediately retrieved from the delayed and poured back into the original cluster.

  1. # pg-test is the original database
  2. pg-test:
  3. hosts:
  4. 10.10.10.11: { pg_seq: 1, pg_role: primary }
  5. vars:
  6. pg_cluster: pg-test
  7. pg_version: 14
  8. # pg-testdelay will be used as a delayed for the pg-test
  9. pg-testdelay:
  10. hosts:
  11. 10.10.10.12: { pg_seq: 1, pg_role: primary , pg_upstream: 10.10.10.11 } # The actual role is Standby Leader
  12. vars:
  13. pg_cluster: pg-testdelay
  14. pg_version: 14

After creation, edit the Patroni config file for the delayed cluster using pg edit-config pg-testdelay in the meta node and change standby_cluster.recovery_min_apply_delay to the delay value you expect.

  1. standby_cluster:
  2. create_replica_methods:
  3. - basebackup
  4. host: 10.10.10.11
  5. port: 5432
  6. + recovery_min_apply_delay: 1h

Cold Backup

Cold backup is the final safeguard for your data assets.

The cold backup database exists as a static file of the data-dir and is a binary backup of the database dir. Cold backups are the last resort in case of accidental deletion of databases or tables, or catastrophic failure of the whole cluster/whole server room.

Pigsty provides a script for making cold backups pg-backup, which can be executed as dbsu on the database node to create a full physical backup of the current instance and place it in the /pg/backup (by default located in {{ pg_fs_bkup }}/backup).

With parameters, you can specify the backup database URL, backup-dir, file name, encryption method, retention policy for existing backups, etc.

  1. $ pg-backup # Execute the backup script without any arguments
  2. [2021-08-05 17:41:35][INFO] ================================================================
  3. [2021-08-05 17:41:35][INFO] [INIT] pg-backup begin, checking parameters
  4. [2021-08-05 17:41:35][DEBUG] [INIT] #====== BINARY
  5. [2021-08-05 17:41:35][DEBUG] [INIT] pg_basebackup : /usr/pgsql/bin/pg_basebackup
  6. [2021-08-05 17:41:35][DEBUG] [INIT] openssl : /bin/openssl
  7. [2021-08-05 17:41:35][DEBUG] [INIT] #====== PARAMETER
  8. [2021-08-05 17:41:35][DEBUG] [INIT] filename (-f) : backup_pg-meta_20210805.tar.lz4
  9. [2021-08-05 17:41:35][DEBUG] [INIT] src (-s) : postgres:///
  10. [2021-08-05 17:41:35][DEBUG] [INIT] dst (-d) : /pg/backup
  11. [2021-08-05 17:41:35][DEBUG] [INIT] tag (-t) : pg-meta
  12. [2021-08-05 17:41:35][DEBUG] [INIT] key (-k) : pg-meta
  13. [2021-08-05 17:41:35][DEBUG] [INIT] encrypt (-e) : false
  14. [2021-08-05 17:41:35][DEBUG] [INIT] upload (-u) : false
  15. [2021-08-05 17:41:35][DEBUG] [INIT] remove (-r) : -mmin +1200
  16. [2021-08-05 17:41:35][INFO] [LOCK] acquire lock @ /tmp/backup.lock
  17. [2021-08-05 17:41:35][INFO] [LOCK] lock acquired success on /tmp/backup.lock, pid=25438
  18. [2021-08-05 17:41:35][INFO] [BKUP] backup begin, from postgres:/// to /pg/backup/backup_pg-meta_20210805.tar.lz4
  19. [2021-08-05 17:41:35][INFO] [BKUP] backup in normal mode
  20. pg_basebackup: initiating base backup, waiting for checkpoint to complete
  21. pg_basebackup: checkpoint completed
  22. pg_basebackup: write-ahead log start point: 0/6B000028 on timeline 1
  23. pg_basebackup: write-ahead log end point: 0/6B000138
  24. pg_basebackup: syncing data to disk ...
  25. pg_basebackup: base backup completed
  26. [2021-08-05 17:41:45][INFO] [BKUP] backup complete!
  27. [2021-08-05 17:41:45][INFO] [RMBK] remove local obsolete backup: 1200
  28. [2021-08-05 17:41:45][INFO] [BKUP] find obsolete backups: find /pg/backup/ -maxdepth 1 -type f -mmin +1200 -name 'backup*.lz4'
  29. [2021-08-05 17:41:45][WARN] [BKUP] remove obsolete backups:
  30. [2021-08-05 17:41:45][INFO] [RMBK] remove old backup complete
  31. [2021-08-05 17:41:45][INFO] [LOCK] release lock @ /tmp/backup.lock
  32. [2021-08-05 17:41:45][INFO] [DONE] backup procdure complete!
  33. [2021-08-05 17:41:45][INFO] ================================================================

This script will use pg_basebackup to initiate a backup from the specified PGURL (default is the local database instance), using a tar archive with lz4 compression and optional openssl RC4 stream encryption.

The backup file is placed in the /pg/backup/ dir by default, and the default file name consists of a prefix, cluster name, and date, e.g., backup_pg-meta_20210805.tar.lz4.

The default backup cleanup policy is to clean up old backup files 1200 minutes (20 hours old) when the latest backup completes.

Restoring from cold backup

To use this backup, you need to set the cluster to maintenance mode (pt pause), stop the data cluster primary, and empty the dataset cluster dir. Then the backup file is unpacked to /pg/data.

  1. # Find the latest backup file and print the information
  2. backup_dir="/pg/backup"
  3. data_dir=/pg/data
  4. backup_latest=$(ls -t ${backup_dir} | head -n1)
  5. echo "backup ${backup_latest} will be used"
  6. # Suspend Patroni, shut down the database, and remove the data directory (dangerous)
  7. pg pause pg-meta
  8. pg_ctl -D /pg/data stop
  9. rm -rf /pg/data/* # Emptying the data directory (dangerous)
  10. # Unzip the backup to the database directory
  11. echo "unlz4 -d -c ${backup_dir}/${backup_latest} | tar -xC ${data_dir}"
  12. unlz4 -d -c ${backup_dir}/${backup_latest} | tar -xC ${data_dir} # Unzip to the database directory
  13. # Optional: If the password is set when encrypting, you need to decrypt it before decompressing it
  14. openssl enc -rc4 -d -k ${PASSWORD} -in ${backup_latest} | unlz4 -d -c | tar -xC ${data_dir}
  15. # Pull up the database again
  16. systemctl restart patroni
  17. # Redo other replicas of the cluster
  18. pg reinit <cluster> # Reset the other instance members of the cluster in turn

There are other handy tools can be used for manage backups: pg_backrest & pg_probackup.

Last modified 2022-06-04: fill en docs (5a858d3)