TiKV uses topology labels (aka label) to declare its location information. PD scheduler uses the labels to optimize TiKV’s failure tolerance capability. This document describes how to configure the labels.

Declare the label hierarchy in PD

Labels are hierarchical, for example, zone > rack > host. You can declare it in the PD configuration file or pd-ctl:

  • PD configuration file:

    1. [replication]
    2. max-replicas = 3
    3. location-labels = ["zone", "rack", "host"]
  • pd-ctl:

    1. pd-ctl >> config set location-labels zone,rack,host

    The number of machines must be no less than the max-replicas.

You can find all the replication configuration options here.

Declare the labels for each TiKV

Assume that the topology has three layers: zone > rack > host. You can set a label for each layer by command line parameter or configuration file, then TiKV will report its label to PD:

  • Command line parameter:

    1. tikv-server --labels zone=<zone>,rack=<rack>,host=<host>
  • TiKV configuration file:

    1. [server]
    2. labels = "zone=<zone>,rack=<rack>,host=<host>"

Example

PD makes optimal scheduling according to the topological information. You just need to care about what kind of topology can achieve the desired effect.

If you use 3 replicas and hope that the TiKV cluster is always highly available even when a data zone goes down, you need at least 4 data zones.

Assume that you have 4 data zones, each zone has 2 racks, and each rack has 2 hosts. You can start 2 TiKV instances on each host as follows:

Startup TiKV:

  1. # zone=z1
  2. tikv-server --labels zone=z1,rack=r1,host=h1
  3. tikv-server --labels zone=z1,rack=r1,host=h2
  4. tikv-server --labels zone=z1,rack=r2,host=h1
  5. tikv-server --labels zone=z1,rack=r2,host=h2
  6. # zone=z2
  7. tikv-server --labels zone=z2,rack=r1,host=h1
  8. tikv-server --labels zone=z2,rack=r1,host=h2
  9. tikv-server --labels zone=z2,rack=r2,host=h1
  10. tikv-server --labels zone=z2,rack=r2,host=h2
  11. # zone=z3
  12. tikv-server --labels zone=z3,rack=r1,host=h1
  13. tikv-server --labels zone=z3,rack=r1,host=h2
  14. tikv-server --labels zone=z3,rack=r2,host=h1
  15. tikv-server --labels zone=z3,rack=r2,host=h2
  16. # zone=z4
  17. tikv-server --labels zone=z4,rack=r1,host=h1
  18. tikv-server --labels zone=z4,rack=r1,host=h2
  19. tikv-server --labels zone=z4,rack=r2,host=h1
  20. tikv-server --labels zone=z4,rack=r2,host=h2

Configure PD:

  1. # use `pd-ctl` connect the PD:
  2. $ pd-ctl
  3. >> config set location-labels zone,rack,host

Now PD will schedule replicas of the same Region to different data zones.

  • If one data zone goes down, the TiKV cluster will still be highly available.
  • If the data zone cannot recover within a period of time, PD will remove the replica from this data zone.