Advanced StatefulSet Controller

Kubernetes has a built-in StatefulSet that allocates consecutive serial numbers to Pods. For example, when there are three replicas, the Pods are named as pod-0, pod-1, and pod-2. When scaling out or scaling in, you must add a Pod at the end or delete the last pod. For example, when you scale out to four replicas, pod-3 is added. When you scale in to two replicas, pod-2 is deleted.

When you use local storage, Pods are associated with the Nodes storage resources and cannot be scheduled freely. If you want to delete one of the Pods in the middle to maintain its Node but no other Nodes can be migrated, or if you want to delete a Pod that fails and to create another Pod with a different serial number, you cannot implement such desired function by the built-in StatefulSet.

The advanced StatefulSet controller is implemented based on the built-in StatefulSet controller. It supports freely controlling the serial number of Pods. This document describes how to use the advanced StatefulSet controller in TiDB Operator.

Enable

  1. Load the Advanced StatefulSet CRD file:

    • For Kubernetes versions < 1.16:

      1. kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.2/manifests/advanced-statefulset-crd.v1beta1.yaml
    • For Kubernetes versions >= 1.16:

      1. kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.3.2/manifests/advanced-statefulset-crd.v1.yaml
  2. Enable the AdvancedStatefulSet feature in values.yaml of the TiDB Operator chart:

    1. features:
    2. - AdvancedStatefulSet=true
    3. advancedStatefulset:
    4. create: true

    Upgrade TiDB Operator. For details, refer to Upgrade TiDB Operator.

Advanced StatefulSet Controller - 图1Note

If the AdvancedStatefulSet feature is enabled, TiDB Operator converts the current StatefulSet object into an AdvancedStatefulSet object. However, after the AdvancedStatefulSet feature is disabled, the AdvancedStatefulSet object cannot be automatically converted to the built-in StatefulSet object of Kubernetes.

Usage

This section describes how to use the advanced StatefulSet controller.

View the AdvancedStatefulSet Object by kubectl

The data format of AdvancedStatefulSet is the same as that of StatefulSet, but AdvancedStatefulSet is implemented based on CRD, with asts as the alias. You can view the AdvancedStatefulSet object in the namespace by running the following command:

  1. kubectl get -n ${namespace} asts

Specify the Pod to be scaled in

With the advanced StatefulSet controller, when scaling in TidbCluster, you can not only reduce the number of replicas, but also specify the scaling in of any Pod in the PD, TiDB, or TiKV components by configuring annotations.

For example:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbCluster
  3. metadata:
  4. name: asts
  5. spec:
  6. version: v5.4.0
  7. timezone: UTC
  8. pvReclaimPolicy: Delete
  9. pd:
  10. baseImage: pingcap/pd
  11. maxFailoverCount: 0
  12. replicas: 3
  13. requests:
  14. storage: "1Gi"
  15. config: {}
  16. tikv:
  17. baseImage: pingcap/tikv
  18. maxFailoverCount: 0
  19. replicas: 4
  20. requests:
  21. storage: "1Gi"
  22. config: {}
  23. tidb:
  24. baseImage: pingcap/tidb
  25. maxFailoverCount: 0
  26. replicas: 2
  27. service:
  28. type: ClusterIP
  29. config: {}

The above configuration deploys 4 TiKV instances, namely basic-tikv-0, basic-tikv-1, …, basic-tikv-3. If you want to delete basic-tikv-1, set spec.tikv.replicas to 3 and configure the following annotations:

  1. metadata:
  2. annotations:
  3. tikv.tidb.pingcap.com/delete-slots: '[1]'

Advanced StatefulSet Controller - 图2Note

When modifying replicas and delete-slots annotation, complete the modification in the same operation; otherwise, the controller operates the modification according to the general expectations.

The complete example is as follows:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbCluster
  3. metadata:
  4. annotations:
  5. tikv.tidb.pingcap.com/delete-slots: '[1]'
  6. name: asts
  7. spec:
  8. version: v5.4.0
  9. timezone: UTC
  10. pvReclaimPolicy: Delete
  11. pd:
  12. baseImage: pingcap/pd
  13. maxFailoverCount: 0
  14. replicas: 3
  15. requests:
  16. storage: "1Gi"
  17. config: {}
  18. tikv:
  19. baseImage: pingcap/tikv
  20. maxFailoverCount: 0
  21. replicas: 3
  22. requests:
  23. storage: "1Gi"
  24. config: {}
  25. tidb:
  26. baseImage: pingcap/tidb
  27. maxFailoverCount: 0
  28. replicas: 2
  29. service:
  30. type: ClusterIP
  31. config: {}

The supported annotations are as follows:

  • pd.tidb.pingcap.com/delete-slots: Specifies the serial numbers of the Pods to be deleted in the PD component.
  • tidb.tidb.pingcap.com/delete-slots: Specifies the serial number of the Pods to be deleted in the TiDB component.
  • tikv.tidb.pingcap.com/delete-slots: Specifies the serial number of the Pods to be deleted in the TiKV component.

The value of Annotation is an integer array of JSON, such as [0], [0,1], [1,3].

Specify the location to scale out

You can reverse the above operation of scaling in to restore basic-tikv-1.

Advanced StatefulSet Controller - 图3Note

The specified scaling out performed by the advanced StatefulSet controller is the same as the regular StatefulSet scaling, which does not delete the Persistent Volume Claims (PVCs) associated with the Pod. If you want to avoid using the previous data, delete the associated PVCs before scaling out at the original location.

For example:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbCluster
  3. metadata:
  4. annotations:
  5. tikv.tidb.pingcap.com/delete-slots: '[]'
  6. name: asts
  7. spec:
  8. version: v5.4.0
  9. timezone: UTC
  10. pvReclaimPolicy: Delete
  11. pd:
  12. baseImage: pingcap/pd
  13. maxFailoverCount: 0
  14. replicas: 3
  15. requests:
  16. storage: "1Gi"
  17. config: {}
  18. tikv:
  19. baseImage: pingcap/tikv
  20. maxFailoverCount: 0
  21. replicas: 4
  22. requests:
  23. storage: "1Gi"
  24. config: {}
  25. tidb:
  26. baseImage: pingcap/tidb
  27. maxFailoverCount: 0
  28. replicas: 2
  29. service:
  30. type: ClusterIP
  31. config: {}

The delete-slots annotations can be left empty or deleted completely.