Deploying a TDengine Cluster in Kubernetes

Deploying a TDengine Cluster in Kubernetes

TDengine is a cloud-native time-series database that can be deployed on Kubernetes. This document gives a step-by-step description of how you can use YAML files to create a TDengine cluster and introduces common operations for TDengine in a Kubernetes environment.

Prerequisites

Before deploying TDengine on Kubernetes, perform the following:

Current steps are compatible with Kubernetes v1.5 and later version.
Install and configure minikube, kubectl, and helm.
Install and deploy Kubernetes and ensure that it can be accessed and used normally. Update any container registries or other services as necessary.

You can download the configuration files in this document from GitHub.

Configure the service

Create a service configuration file named taosd-service.yaml. Record the value of metadata.name (in this example, taos) for use in the next step. Add the ports required by TDengine:

---
apiVersion: v1
kind: Service
metadata:
  name: "taosd"
  labels:
    app: "tdengine"
spec:
  ports:
    - name: tcp6030
      - protocol: "TCP"
      port: 6030
    - name: tcp6041
      - protocol: "TCP"
      port: 6041
  selector:
    app: "tdengine"

Configure the service as StatefulSet

Configure the TDengine service as a StatefulSet. Create the tdengine.yaml file and set replicas to 3. In this example, the region is set to Asia/Shanghai and 10 GB of standard storage are allocated per node. You can change the configuration based on your environment and business requirements.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: "tdengine"
  labels:
    app: "tdengine"
spec:
  serviceName: "taosd"
  replicas: 3
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: "tdengine"
  template:
    metadata:
      name: "tdengine"
      labels:
        app: "tdengine"
    spec:
      containers:
        - name: "tdengine"
          image: "tdengine/tdengine:3.0.0.0"
          imagePullPolicy: "IfNotPresent"
          ports:
            - name: tcp6030
              - protocol: "TCP"
              containerPort: 6030
            - name: tcp6041
              - protocol: "TCP"
              containerPort: 6041
          env:
            # POD_NAME for FQDN config
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # SERVICE_NAME and NAMESPACE for fqdn resolve
            - name: SERVICE_NAME
              value: "taosd"
            - name: STS_NAME
              value: "tdengine"
            - name: STS_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # TZ for timezone settings, we recommend to always set it.
            - name: TZ
              value: "Asia/Shanghai"
            # TAOS_ prefix will configured in taos.cfg, strip prefix and camelCase.
            - name: TAOS_SERVER_PORT
              value: "6030"
            # Must set if you want a cluster.
            - name: TAOS_FIRST_EP
              value: "$(STS_NAME)-0.$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local:$(TAOS_SERVER_PORT)"
            # TAOS_FQDN should always be set in k8s env.
            - name: TAOS_FQDN
              value: "$(POD_NAME).$(SERVICE_NAME).$(STS_NAMESPACE).svc.cluster.local"
          volumeMounts:
            - name: taosdata
              mountPath: /var/lib/taos
          readinessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 5
            timeoutSeconds: 5000
          livenessProbe:
            exec:
              command:
                - taos-check
            initialDelaySeconds: 15
            periodSeconds: 20
  volumeClaimTemplates:
    - metadata:
        name: taosdata
      spec:
        accessModes:
          - "ReadWriteOnce"
        storageClassName: "standard"
        resources:
          requests:
            storage: "10Gi"

Use kubectl to deploy TDengine

Run the following commands:

kubectl apply -f taosd-service.yaml
kubectl apply -f tdengine.yaml

The preceding configuration generates a TDengine cluster with three nodes in which dnodes are automatically configured. You can run the show dnodes command to query the nodes in the cluster:

kubectl exec -i -t tdengine-0 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-1 -- taos -s "show dnodes"
kubectl exec -i -t tdengine-2 -- taos -s "show dnodes"

The output is as follows:

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:14:57.285 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:11.302 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:23.290 |                                |
Query OK, 3 rows in database (0.003655s)

Enable port forwarding

The kubectl port forwarding feature allows applications to access the TDengine cluster running on Kubernetes.

kubectl port-forward tdengine-0 6041:6041 &

Use curl to verify that the TDengine REST API is working on port 6041:

$ curl -u root:taosdata -d "show databases" 127.0.0.1:6041/rest/sql
Handling connection for 6041
{"code":0,"column_meta":[["name","VARCHAR",64],["create_time","TIMESTAMP",8],["vgroups","SMALLINT",2],["ntables","BIGINT",8],["replica","TINYINT",1],["strict","VARCHAR",4],["duration","VARCHAR",10],["keep","VARCHAR",32],["buffer","INT",4],["pagesize","INT",4],["pages","INT",4],["minrows","INT",4],["maxrows","INT",4],["comp","TINYINT",1],["precision","VARCHAR",2],["status","VARCHAR",10],["retention","VARCHAR",60],["single_stable","BOOL",1],["cachemodel","VARCHAR",11],["cachesize","INT",4],["wal_level","TINYINT",1],["wal_fsync_period","INT",4],["wal_retention_period","INT",4],["wal_retention_size","BIGINT",8],["wal_roll_period","INT",4],["wal_segment_size","BIGINT",8]],"data":[["information_schema",null,null,16,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null],["performance_schema",null,null,10,null,null,null,null,null,null,null,null,null,null,null,"ready",null,null,null,null,null,null,null,null,null,null]],"rows":2}

Enable the dashboard for visualization

The minikube dashboard command enables visualized cluster management.

$ minikube dashboard
* Verifying dashboard health ...
* Launching proxy ...
* Verifying proxy health ...
* Opening http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/ in your default browser...
http://127.0.0.1:46617/api/v1/namespaces/kubernetes-dashboard/services/http:kubernetes-dashboard:/proxy/

In some public clouds, minikube cannot be remotely accessed if it is bound to 127.0.0.1. In this case, use the kubectl proxy command to map the port to 0.0.0.0. Then, you can access the dashboard by using a web browser to open the dashboard URL above on the public IP address and port of the virtual machine.

$ kubectl proxy --accept-hosts='^.*$' --address='0.0.0.0'

Scaling Out Your Cluster

TDengine clusters can scale automatically:

kubectl scale statefulsets tdengine --replicas=4

The preceding command increases the number of replicas to 4. After running this command, query the pod status:

kubectl get pods -l app=tdengine

The output is as follows:

NAME         READY   STATUS    RESTARTS   AGE
tdengine-0   1/1     Running   0          161m
tdengine-1   1/1     Running   0          161m
tdengine-2   1/1     Running   0          32m
tdengine-3   1/1     Running   0          32m

The status of all pods is Running. Once the pod status changes to Ready, you can check the dnode status:

kubectl exec -i -t tdengine-3 -- taos -s "show dnodes"

The following output shows that the TDengine cluster has been expanded to 4 replicas:

taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:14:57.285 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:11.302 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:23.290 |                                |
      4 | tdengine-3.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:33:16.039 |                                |
Query OK, 4 rows in database (0.008377s)

Scaling In Your Cluster

When you scale in a TDengine cluster, your data is migrated to different nodes. You must run the drop dnodes command in TDengine to remove dnodes before scaling in your Kubernetes environment.

Note: In a Kubernetes StatefulSet service, the newest pods are always removed first. For this reason, when you scale in your TDengine cluster, ensure that you drop the newest dnodes.

$ kubectl exec -i -t tdengine-0 -- taos -s "drop dnode 4"

$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
taos> show dnodes
   id   |            endpoint            | vnodes | support_vnodes |   status   |       create_time       |              note              |
============================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:14:57.285 |                                |
      2 | tdengine-1.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:11.302 |                                |
      3 | tdengine-2.taosd.default.sv... |      0 |            256 | ready      | 2022-08-10 13:15:23.290 |                                |
Query OK, 3 rows in database (0.004861s)

Verify that the dnode have been successfully removed by running the kubectl exec -i -t tdengine-0 -- taos -s "show dnodes" command. Then run the following command to remove the pod:

kubectl scale statefulsets tdengine --replicas=3

The newest pod in the deployment is removed. Run the kubectl get pods -l app=tdengine command to query the pod status:

$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 4m7s
tdengine-1 1/1 Running 0 3m55s
tdengine-2 1/1 Running 0 2m28s

After the pod has been removed, manually delete the PersistentVolumeClaim (PVC). Otherwise, future scale-outs will attempt to use existing data.

$ kubectl delete pvc taosdata-tdengine-3

Your cluster has now been safely scaled in, and you can scale it out again as necessary.

$ kubectl scale statefulsets tdengine --replicas=4
statefulset.apps/tdengine scaled
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 ContainerCreating 0 4s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl get pods -l app=tdengine
NAME READY STATUS RESTARTS AGE
tdengine-0 1/1 Running 0 35m
tdengine-1 1/1 Running 0 34m
tdengine-2 1/1 Running 0 12m
tdengine-3 0/1 Running 0 7s
it@k8s-2:~/TDengine-Operator/src/tdengine$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:01:36.479 | |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 18:13:54.411 | |
Query OK, 4 row(s) in set (0.001348s)

Remove a TDengine Cluster

To fully remove a TDengine cluster, you must delete its statefulset, svc, configmap, and pvc entries:

kubectl delete statefulset -l app=tdengine
kubectl delete svc -l app=tdengine
kubectl delete pvc -l app=tdengine
kubectl delete configmap taoscfg

Troubleshooting

Error 1

If you remove a pod without first running drop dnode, some TDengine nodes will go offline.

$ kubectl exec -it tdengine-0 -- taos -s "show dnodes"
taos> show dnodes
id | endpoint | vnodes | support_vnodes | status | create_time | offline reason |
======================================================================================================================================
1 | tdengine-0.taosd.default.sv... | 0 | 4 | ready | 2022-07-25 17:38:49.012 | |
2 | tdengine-1.taosd.default.sv... | 1 | 4 | ready | 2022-07-25 17:39:01.517 | |
5 | tdengine-2.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:01:36.479 | status msg timeout |
6 | tdengine-3.taosd.default.sv... | 0 | 4 | offline | 2022-07-25 18:13:54.411 | status msg timeout |
Query OK, 4 row(s) in set (0.001323s)

Error 2

If the number of nodes after a scale-in is less than the value of the replica parameter, the cluster will go down:

Create a database with replica set to 2 and add data.

kubectl exec -i -t tdengine-0 -- \
  taos -s \
  "create database if not exists test replica 2;
   use test;
   create table if not exists t1(ts timestamp, n int);
   insert into t1 values(now, 1)(now+1s, 2);"

Scale in to one node:

kubectl scale statefulsets tdengine --replicas=1

In the TDengine CLI, you can see that no database operations succeed:

taos> show dnodes;
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      2 |     40 | ready      | any   | 2021-06-01 15:55:52.562 |                          |
      2 | tdengine-1.taosd.default.sv... |      1 |     40 | offline    | any   | 2021-06-01 15:56:07.212 | status msg timeout       |
Query OK, 2 row(s) in set (0.000845s)
taos> show dnodes;
   id   |           end_point            | vnodes | cores  |   status   | role  |       create_time       |      offline reason      |
======================================================================================================================================
      1 | tdengine-0.taosd.default.sv... |      2 |     40 | ready      | any   | 2021-06-01 15:55:52.562 |                          |
      2 | tdengine-1.taosd.default.sv... |      1 |     40 | offline    | any   | 2021-06-01 15:56:07.212 | status msg timeout       |
Query OK, 2 row(s) in set (0.000837s)
taos> use test;
Database changed.
taos> insert into t1 values(now, 3);
DB error: Unable to resolve FQDN (0.013874s)