Deploy TiDB on AWS EKS

This document describes how to deploy a TiDB cluster on AWS Elastic Kubernetes Service (EKS).

To deploy TiDB Operator and the TiDB cluster in a self-managed Kubernetes environment, refer to Deploy TiDB Operator and Deploy TiDB in General Kubernetes.

Prerequisites

Before deploying a TiDB cluster on AWS EKS, make sure the following requirements are satisfied:

  • Install Helm 3: used for deploying TiDB Operator.

  • Complete all operations in Getting started with eksctl.

    This guide includes the following contents:

    • Install and configure awscli.
    • Install and configure eksctl used for creating Kubernetes clusters.
    • Install kubectl.

To verify whether AWS CLI is configured correctly, run the aws configure list command. If the output shows the values for access_key and secret_key, AWS CLI is configured correctly. Otherwise, you need to re-configure AWS CLI.

Amazon EKS - 图1Note

The operations described in this document require at least the minimum privileges needed by eksctl and the service privileges needed to create a Linux bastion host.

  • Instance types: to gain better performance, the following is recommended:
    • PD nodes: c5.xlarge
    • TiDB nodes: c5.2xlarge
    • TiKV or TiFlash nodes: r5b.2xlarge
  • Storage: Because AWS supports the EBS gp3 volume type, it is recommended to use EBS gp3. For gp3 provisioning, the following is recommended:
    • TiKV: 400 MiB/s, 4000 IOPS
    • TiFlash: 625 MiB/s, 6000 IOPS
  • AMI type: Amazon Linux 2

Create an EKS cluster and a node pool

According to AWS Official Blog recommendation and EKS Best Practice Document, since most of the TiDB cluster components use EBS volumes as storage, it is recommended to create a node pool in each availability zone (at least 3 in total) for each component when creating an EKS.

Save the following configuration as the cluster.yaml file. Replace ${clusterName} with your desired cluster name. The cluster and node group names should match the regular expression [a-zA-Z][-a-zA-Z0-9]*, so avoid names that contain _.

  1. apiVersion: eksctl.io/v1alpha5
  2. kind: ClusterConfig
  3. metadata:
  4. name: ${clusterName}
  5. region: ap-northeast-1
  6. nodeGroups:
  7. - name: admin
  8. desiredCapacity: 1
  9. privateNetworking: true
  10. labels:
  11. dedicated: admin
  12. - name: tidb-1a
  13. desiredCapacity: 1
  14. privateNetworking: true
  15. availabilityZones: ["ap-northeast-1a"]
  16. instanceType: c5.2xlarge
  17. labels:
  18. dedicated: tidb
  19. taints:
  20. dedicated: tidb:NoSchedule
  21. - name: tidb-1d
  22. desiredCapacity: 0
  23. privateNetworking: true
  24. availabilityZones: ["ap-northeast-1d"]
  25. instanceType: c5.2xlarge
  26. labels:
  27. dedicated: tidb
  28. taints:
  29. dedicated: tidb:NoSchedule
  30. - name: tidb-1c
  31. desiredCapacity: 1
  32. privateNetworking: true
  33. availabilityZones: ["ap-northeast-1c"]
  34. instanceType: c5.2xlarge
  35. labels:
  36. dedicated: tidb
  37. taints:
  38. dedicated: tidb:NoSchedule
  39. - name: pd-1a
  40. desiredCapacity: 1
  41. privateNetworking: true
  42. availabilityZones: ["ap-northeast-1a"]
  43. instanceType: c5.xlarge
  44. labels:
  45. dedicated: pd
  46. taints:
  47. dedicated: pd:NoSchedule
  48. - name: pd-1d
  49. desiredCapacity: 1
  50. privateNetworking: true
  51. availabilityZones: ["ap-northeast-1d"]
  52. instanceType: c5.xlarge
  53. labels:
  54. dedicated: pd
  55. taints:
  56. dedicated: pd:NoSchedule
  57. - name: pd-1c
  58. desiredCapacity: 1
  59. privateNetworking: true
  60. availabilityZones: ["ap-northeast-1c"]
  61. instanceType: c5.xlarge
  62. labels:
  63. dedicated: pd
  64. taints:
  65. dedicated: pd:NoSchedule
  66. - name: tikv-1a
  67. desiredCapacity: 1
  68. privateNetworking: true
  69. availabilityZones: ["ap-northeast-1a"]
  70. instanceType: r5b.2xlarge
  71. labels:
  72. dedicated: tikv
  73. taints:
  74. dedicated: tikv:NoSchedule
  75. - name: tikv-1d
  76. desiredCapacity: 1
  77. privateNetworking: true
  78. availabilityZones: ["ap-northeast-1d"]
  79. instanceType: r5b.2xlarge
  80. labels:
  81. dedicated: tikv
  82. taints:
  83. dedicated: tikv:NoSchedule
  84. - name: tikv-1c
  85. desiredCapacity: 1
  86. privateNetworking: true
  87. availabilityZones: ["ap-northeast-1c"]
  88. instanceType: r5b.2xlarge
  89. labels:
  90. dedicated: tikv
  91. taints:
  92. dedicated: tikv:NoSchedule

By default, only two TiDB nodes are required, so you can set the desiredCapacity of the tidb-1d node group to 0. You can scale out this node group any time if necessary.

Execute the following command to create the cluster:

  1. eksctl create cluster -f cluster.yaml

After executing the command above, you need to wait until the EKS cluster is successfully created and the node group is created and added in the EKS cluster. This process might take 5 to 20 minutes. For more cluster configuration, refer to eksctl documentation.

Amazon EKS - 图2Warning

If the Regional Auto Scaling Group (ASG) is used:

Configure StorageClass

This section describes how to configure the storage class for different storage types. These storage types are:

  • The default gp2 storage type after creating the EKS cluster.
  • The gp3 storage type (recommended) or other EBS storage types.
  • The local storage used for testing bare-metal performance.

Configure gp2

After you create an EKS cluster, the default StorageClass is gp2. To improve I/O write performance, it is recommended to configure nodelalloc and noatime in the mountOptions field of the StorageClass resource.

  1. kind: StorageClass
  2. apiVersion: storage.k8s.io/v1
  3. # ...
  4. mountOptions:
  5. - nodelalloc,noatime

For more information on the mount options, see TiDB Environment and System Configuration Check.

If you do not want to use the default gp2 storage type, you can create StorageClass for other storage types. For example, you can use the gp3 (recommended) or io1 storage type.

The following example shows how to create and configure a StorageClass for the gp3 storage type:

  1. Deploy the AWS EBS Container Storage Interface (CSI) driver on the EKS cluster. If you are using a storage type other than gp3, skip this step.

  2. Set ebs-csi-node toleration.

    1. kubectl patch -n kube-system ds ebs-csi-node -p '{"spec":{"template":{"spec":{"tolerations":[{"operator":"Exists"}]}}}}'

    Expected output:

    1. daemonset.apps/ebs-csi-node patched
  3. Create a StorageClass resource. In the resource definition, specify your desired storage type in the parameters.type field.

    1. kind: StorageClass
    2. apiVersion: storage.k8s.io/v1
    3. metadata:
    4. name: gp3
    5. provisioner: ebs.csi.aws.com
    6. allowVolumeExpansion: true
    7. volumeBindingMode: WaitForFirstConsumer
    8. parameters:
    9. type: gp3
    10. fsType: ext4
    11. iops: "4000"
    12. throughput: "400"
    13. mountOptions:
    14. - nodelalloc,noatime
  4. In the TidbCluster YAML file, configure gp3 in the storageClassName field. For example:

    1. spec:
    2. tikv:
    3. ...
    4. storageClassName: gp3
  5. To improve I/O write performance, it is recommended to configure nodelalloc and noatime in the mountOptions field of the StorageClass resource.

    1. kind: StorageClass
    2. apiVersion: storage.k8s.io/v1
    3. # ...
    4. mountOptions:
    5. - nodelalloc,noatime

    For more information on the mount options, see TiDB Environment and System Configuration Check.

For more information on the EBS storage types and configuration, refer to Amazon EBS volume types and Storage Classes.

Configure local storage

Local storage is used for testing bare-metal performance. For higher IOPS and lower latency, you can choose NVMe SSD volumes offered by some AWS instances for the TiKV node pool. However, for the production environment, use AWS EBS as your storage type.

Amazon EKS - 图3Note

  • You cannot dynamically change StorageClass for a running TiDB cluster. For testing purposes, create a new TiDB cluster with the desired StorageClass.
  • EKS upgrade or other reasons might cause node reconstruction. In such cases, data in the local storage might be lost. To avoid data loss, you need to back up TiKV data before node reconstruction.
  • To avoid data loss from node reconstruction, you can refer to AWS documentation and disable the ReplaceUnhealthy feature of the TiKV node group.

For instance types that provide NVMe SSD volumes, check out Amazon EC2 Instance Types.

The following c5d.4xlarge example shows how to configure StorageClass for the local storage:

  1. Create a node group with local storage for TiKV.

    1. In the eksctl configuration file, modify the instance type of the TiKV node group to c5d.4xlarge:

      1. - name: tikv-1a
      2. desiredCapacity: 1
      3. privateNetworking: true
      4. availabilityZones: ["ap-northeast-1a"]
      5. instanceType: c5d.4xlarge
      6. labels:
      7. dedicated: tikv
      8. taints:
      9. dedicated: tikv:NoSchedule
      10. ...
    2. Create a node group with local storage:

      1. eksctl create nodegroups -f cluster.yaml

    If the TiKV node group already exists, to avoid name conflict, you can take either of the following actions:

    • Delete the old group and create a new one.
    • Change the group name.
  2. Deploy local volume provisioner.

    1. To conveniently discover and manage local storage volumes, install local-volume-provisioner.

    2. Mount the local storage to the /mnt/ssd directory.

    3. According to the mounting configuration, modify the local-volume-provisioner.yaml file.

    4. Deploy and create a local-storage storage class using the modified local-volume-provisioner.yaml file.

      1. kubectl apply -f <local-volume-provisioner.yaml>
  3. Use the local storage.

    After you complete the previous step, local-volume-provisioner can discover all the local NVMe SSD volumes in the cluster.

After local-volume-provisioner discovers the local volumes, when you Deploy a TiDB cluster and the monitoring component, you need to add the tikv.storageClassName field to tidb-cluster.yaml and set the field value to local-storage.

Deploy TiDB Operator

To deploy TiDB Operator in the EKS cluster, refer to the Deploy TiDB Operator section in Getting Started.

Deploy a TiDB cluster and the monitoring component

This section describes how to deploy a TiDB cluster and its monitoring component in AWS EKS.

Create namespace

To create a namespace to deploy the TiDB cluster, run the following command:

  1. kubectl create namespace tidb-cluster

Amazon EKS - 图4Note

A namespace is a virtual cluster backed by the same physical cluster. This document takes tidb-cluster as an example. If you want to use another namespace, modify the corresponding arguments of -n or --namespace.

Deploy

First, download the sample TidbCluster and TidbMonitor configuration files:

  1. curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aws/tidb-cluster.yaml && \
  2. curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aws/tidb-monitor.yaml

Refer to configure the TiDB cluster to further customize and configure the CR before applying.

Amazon EKS - 图5Note

By default, the configuration in tidb-cluster.yaml sets up the LoadBalancer for TiDB with the “internal” scheme. This means that the LoadBalancer is only accessible within the VPC, not externally. To access TiDB over the MySQL protocol, you need to use a bastion host or use kubectl port-forward. If you want to expose TiDB over the internet and if you are aware of the risks of doing this, you can change the scheme for the LoadBalancer from “internal” to “internet-facing” in the tidb-cluster.yaml file.

To deploy the TidbCluster and TidbMonitor CR in the EKS cluster, run the following command:

  1. kubectl apply -f tidb-cluster.yaml -n tidb-cluster && \
  2. kubectl apply -f tidb-monitor.yaml -n tidb-cluster

After the YAML file above is applied to the Kubernetes cluster, TiDB Operator creates the desired TiDB cluster and its monitoring component according to the YAML file.

Amazon EKS - 图6Note

If you need to deploy a TiDB cluster on ARM64 machines, refer to Deploy a TiDB Cluster on ARM64 Machines.

View the cluster status

To view the status of the starting TiDB cluster, run the following command:

  1. kubectl get pods -n tidb-cluster

When all the Pods are in the Running or Ready state, the TiDB cluster is successfully started. For example:

  1. NAME READY STATUS RESTARTS AGE
  2. tidb-discovery-5cb8474d89-n8cxk 1/1 Running 0 47h
  3. tidb-monitor-6fbcc68669-dsjlc 3/3 Running 0 47h
  4. tidb-pd-0 1/1 Running 0 47h
  5. tidb-pd-1 1/1 Running 0 46h
  6. tidb-pd-2 1/1 Running 0 46h
  7. tidb-tidb-0 2/2 Running 0 47h
  8. tidb-tidb-1 2/2 Running 0 46h
  9. tidb-tikv-0 1/1 Running 0 47h
  10. tidb-tikv-1 1/1 Running 0 47h
  11. tidb-tikv-2 1/1 Running 0 47h

Access the database

After you have deployed a TiDB cluster, you can access the TiDB database to test or develop your application.

Prepare a bastion host

The LoadBalancer created for your TiDB cluster is an intranet LoadBalancer. You can create a bastion host in the cluster VPC to access the database. To create a bastion host on AWS console, refer to AWS documentation.

Select the cluster’s VPC and Subnet, and verify whether the cluster name is correct in the dropdown box. You can view the cluster’s VPC and Subnet by running the following command:

  1. eksctl get cluster -n ${clusterName}

Allow the bastion host to access the Internet. Select the correct key pair so that you can log in to the host via SSH.

Amazon EKS - 图7Note

In addition to the bastion host, you can also connect an existing host to the cluster VPC by VPC Peering. If the EKS cluster is created in an existing VPC, you can use the host in the VPC.

Install the MySQL client and connect

After the bastion host is created, you can connect to the bastion host via SSH and access the TiDB cluster via the MySQL client.

  1. Log in to the bastion host via SSH:

    1. ssh [-i /path/to/your/private-key.pem] ec2-user@<bastion-public-dns-name>
  2. Install the MySQL client on the bastion host:

    1. sudo yum install mysql -y
  3. Connect the client to the TiDB cluster:

    1. mysql --comments -h ${tidb-nlb-dnsname} -P 4000 -u root

    ${tidb-nlb-dnsname} is the LoadBalancer domain name of the TiDB service. You can view the domain name in the EXTERNAL-IP field by executing kubectl get svc basic-tidb -n tidb-cluster.

    For example:

    1. $ mysql --comments -h abfc623004ccb4cc3b363f3f37475af1-9774d22c27310bc1.elb.us-west-2.amazonaws.com -P 4000 -u root
    2. Welcome to the MariaDB monitor. Commands end with ; or \g.
    3. Your MySQL connection id is 1189
    4. Server version: 5.7.25-TiDB-v4.0.2 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible
    5. <!-- Copy -->right (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
    6. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    7. MySQL [(none)]> show status;
    8. +--------------------+--------------------------------------+
    9. | Variable_name | Value |
    10. +--------------------+--------------------------------------+
    11. | Ssl_cipher | |
    12. | Ssl_cipher_list | |
    13. | Ssl_verify_mode | 0 |
    14. | Ssl_version | |
    15. | ddl_schema_version | 22 |
    16. | server_id | ed4ba88b-436a-424d-9087-977e897cf5ec |
    17. +--------------------+--------------------------------------+
    18. 6 rows in set (0.00 sec)

Amazon EKS - 图8Note

  • The default authentication plugin of MySQL 8.0 is updated from mysql_native_password to caching_sha2_password. Therefore, if you use MySQL client from MySQL 8.0 to access the TiDB service (cluster version < v4.0.7), and if the user account has a password, you need to explicitly specify the --default-auth=mysql_native_password parameter.
  • By default, TiDB (starting from v4.0.2) periodically shares usage details with PingCAP to help understand how to improve the product. For details about what is shared and how to disable the sharing, see Telemetry.

Access the Grafana monitoring dashboard

Obtain the LoadBalancer domain name of Grafana:

  1. kubectl -n tidb-cluster get svc basic-grafana

For example:

  1. $ kubectl get svc basic-grafana
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. basic-grafana LoadBalancer 10.100.199.42 a806cfe84c12a4831aa3313e792e3eed-1964630135.us-west-2.elb.amazonaws.com 3000:30761/TCP 121m

In the output above, the EXTERNAL-IP column is the LoadBalancer domain name.

You can access the ${grafana-lb}:3000 address using your web browser to view monitoring metrics. Replace ${grafana-lb} with the LoadBalancer domain name.

Amazon EKS - 图9Note

The default Grafana username and password are both admin.

Access the TiDB Dashboard

See Access TiDB Dashboard for instructions about how to securely allow access to the TiDB Dashboard.

Upgrade

To upgrade the TiDB cluster, execute the following command:

  1. kubectl patch tc basic -n tidb-cluster --type merge -p '{"spec":{"version":"${version}"}}`.

The upgrade process does not finish immediately. You can watch the upgrade progress by executing kubectl get pods -n tidb-cluster --watch.

Scale out

Before scaling out the cluster, you need to scale out the corresponding node group so that the new instances have enough resources for operation.

This section describes how to scale out the EKS node group and TiDB components.

Scale out EKS node group

When scaling out TiKV, the node groups must be scaled out evenly among the different availability zones. The following example shows how to scale out the tikv-1a, tikv-1c, and tikv-1d groups of the ${clusterName} cluster to 2 nodes:

  1. eksctl scale nodegroup --cluster ${clusterName} --name tikv-1a --nodes 2 --nodes-min 2 --nodes-max 2
  2. eksctl scale nodegroup --cluster ${clusterName} --name tikv-1c --nodes 2 --nodes-min 2 --nodes-max 2
  3. eksctl scale nodegroup --cluster ${clusterName} --name tikv-1d --nodes 2 --nodes-min 2 --nodes-max 2

For more information on managing node groups, refer to eksctl documentation.

Scale out TiDB components

After scaling out the EKS node group, execute kubectl edit tc basic -n tidb-cluster, and modify each component’s replicas to the desired number of replicas. The scaling-out process is then completed.

Deploy TiFlash/TiCDC

TiFlash is the columnar storage extension of TiKV.

TiCDC is a tool for replicating the incremental data of TiDB by pulling TiKV change logs.

The two components are not required in the deployment. This section shows a quick start example.

Add node groups

In the configuration file of eksctl (cluster.yaml), add the following two items to add a node group for TiFlash/TiCDC respectively. desiredCapacity is the number of nodes you desire.

  1. - name: tiflash-1a
  2. desiredCapacity: 1
  3. privateNetworking: true
  4. availabilityZones: ["ap-northeast-1a"]
  5. labels:
  6. dedicated: tiflash
  7. taints:
  8. dedicated: tiflash:NoSchedule
  9. - name: tiflash-1d
  10. desiredCapacity: 1
  11. privateNetworking: true
  12. availabilityZones: ["ap-northeast-1d"]
  13. labels:
  14. dedicated: tiflash
  15. taints:
  16. dedicated: tiflash:NoSchedule
  17. - name: tiflash-1c
  18. desiredCapacity: 1
  19. privateNetworking: true
  20. availabilityZones: ["ap-northeast-1c"]
  21. labels:
  22. dedicated: tiflash
  23. taints:
  24. dedicated: tiflash:NoSchedule
  25. - name: ticdc-1a
  26. desiredCapacity: 1
  27. privateNetworking: true
  28. availabilityZones: ["ap-northeast-1a"]
  29. labels:
  30. dedicated: ticdc
  31. taints:
  32. dedicated: ticdc:NoSchedule
  33. - name: ticdc-1d
  34. desiredCapacity: 1
  35. privateNetworking: true
  36. availabilityZones: ["ap-northeast-1d"]
  37. labels:
  38. dedicated: ticdc
  39. taints:
  40. dedicated: ticdc:NoSchedule
  41. - name: ticdc-1c
  42. desiredCapacity: 1
  43. privateNetworking: true
  44. availabilityZones: ["ap-northeast-1c"]
  45. labels:
  46. dedicated: ticdc
  47. taints:
  48. dedicated: ticdc:NoSchedule

Depending on the EKS cluster status, use different commands:

  • If the cluster is not created, execute eksctl create cluster -f cluster.yaml to create the cluster and node groups.
  • If the cluster is already created, execute eksctl create nodegroup -f cluster.yaml to create the node groups. The existing node groups are ignored and will not be created again.

Configure and deploy

  • To deploy TiFlash, configure spec.tiflash in tidb-cluster.yaml:

    1. spec:
    2. ...
    3. tiflash:
    4. baseImage: pingcap/tiflash
    5. maxFailoverCount: 0
    6. replicas: 1
    7. storageClaims:
    8. - resources:
    9. requests:
    10. storage: 100Gi
    11. tolerations:
    12. - effect: NoSchedule
    13. key: dedicated
    14. operator: Equal
    15. value: tiflash

    For other parameters, refer to Configure a TiDB Cluster.

    Amazon EKS - 图10Warning

    TiDB Operator automatically mount PVs in the order of the configuration in the storageClaims list. Therefore, if you need to add disks for TiFlash, make sure that you add the disks only to the end of the original configuration in the list. In addition, you must not alter the order of the original configuration.

  • To deploy TiCDC, configure spec.ticdc in tidb-cluster.yaml:

    1. spec:
    2. ...
    3. ticdc:
    4. baseImage: pingcap/ticdc
    5. replicas: 1
    6. tolerations:
    7. - effect: NoSchedule
    8. key: dedicated
    9. operator: Equal
    10. value: ticdc

    Modify replicas according to your needs.

Finally, execute kubectl -n tidb-cluster apply -f tidb-cluster.yaml to update the TiDB cluster configuration.

For detailed CR configuration, refer to API references and Configure a TiDB Cluster.

Deploy TiDB Enterprise Edition

To deploy TiDB/PD/TiKV/TiFlash/TiCDC Enterprise Edition, configure spec.[tidb|pd|tikv|tiflash|ticdc].baseImage in tidb-cluster.yaml as the enterprise image. The enterprise image format is pingcap/[tidb|pd|tikv|tiflash|ticdc]-enterprise.

For example:

  1. spec:
  2. ...
  3. pd:
  4. baseImage: pingcap/pd-enterprise
  5. ...
  6. tikv:
  7. baseImage: pingcap/tikv-enterprise