PD Recover is a disaster recovery tool of PD, used to recover the PD cluster which cannot start or provide services normally.

Source code compiling

  • Go Version 1.11 or later
  • In the root directory of the PD project, use the make command to compile and generate bin/pd-recover

Usage

This section describes how to recover a PD cluster which cannot start or provide services normally.

Flags description

  1. -alloc-id uint
  2. Specify a number larger than the allocated ID of the original cluster
  3. -cacert string
  4. Specify the path to the trusted CA certificate file in PEM format
  5. -cert string
  6. Specify the path to the SSL certificate file in PEM format
  7. -key string
  8. Specify the path to the SSL certificate key file in PEM format, which is the private key of the certificate specified by `--cert`
  9. -cluster-id uint
  10. Specify the Cluster ID of the original cluster
  11. -endpoints string
  12. Specify the PD address (default: "http://127.0.0.1:2379")

Recovery flow

  • Obtain the Cluster ID and the Alloc ID from the current cluster.

    • Obtain the Cluster ID from the PD and TiKV logs.
    • Obtain the allocated Alloc ID from either the PD log or the Metadata Information in the PD monitoring panel.Specifying alloc-id requires a number larger than the current largest Alloc ID. If you fail to obtain the Alloc ID, you can make an estimate of a larger number according to the number of Regions and stores in the cluster. Generally, you can specify a number that is several orders of magnitude larger.
  • Stop the whole cluster, clear the PD data directory, and restart the PD cluster.

  • Use PD Recover to recover and make sure that you use the correct cluster-id and appropriate alloc-id.

  • When the recovery success information is prompted, restart the whole cluster.