Deploy Monitoring and Alerts for a TiDB Cluster

This document describes how to monitor a TiDB cluster deployed using TiDB Operator and configure alerts for the cluster.

Monitor the TiDB cluster

You can monitor the TiDB cluster with Prometheus and Grafana. When you create a new TiDB cluster using TiDB Operator, you can deploy a separate monitoring system for the TiDB cluster. The monitoring system must run in the same namespace as the TiDB cluster, and includes two components: Prometheus and Grafana.

For configuration details on the monitoring system, refer to TiDB Cluster Monitoring.

In TiDB Operator v1.1 or later versions, you can monitor a TiDB cluster on a Kubernetes cluster by using a simple Custom Resource (CR) file called TidbMonitor.

Deploy Monitoring and Alerts for TiDB - 图1Note

  • spec.clusters[].name should be set to the TidbCluster name of the corresponding TiDB cluster.

Persist monitoring data

The monitoring data is not persisted by default. To persist the monitoring data, you can set spec.persistent to true in TidbMonitor. When you enable this option, you need to set spec.storageClassName to an existing storage in the current cluster. This storage must support persisting data; otherwise, there is a risk of data loss.

A configuration example is as follows:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: basic
  5. spec:
  6. clusters:
  7. - name: basic
  8. persistent: true
  9. storageClassName: ${storageClassName}
  10. storage: 5G
  11. prometheus:
  12. baseImage: prom/prometheus
  13. version: v2.27.1
  14. service:
  15. type: NodePort
  16. grafana:
  17. baseImage: grafana/grafana
  18. version: 7.5.11
  19. service:
  20. type: NodePort
  21. initializer:
  22. baseImage: pingcap/tidb-monitor-initializer
  23. version: v5.4.0
  24. reloader:
  25. baseImage: pingcap/tidb-monitor-reloader
  26. version: v1.0.1
  27. prometheusReloader:
  28. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  29. version: v0.49.0
  30. imagePullPolicy: IfNotPresent

To verify the PVC status, run the following command:

  1. kubectl get pvc -l app.kubernetes.io/instance=basic,app.kubernetes.io/component=monitor -n ${namespace}
  1. NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
  2. basic-monitor Bound pvc-6db79253-cc9e-4730-bbba-ba987c29db6f 5G RWO standard 51s

Customize the Prometheus configuration

You can customize the Prometheus configuration by using a customized configuration file or by adding extra options to the command.

Use a customized configuration file

  1. Create a ConfigMap for your customized configuration, and set the key name of data to prometheus-config.
  2. Set spec.prometheus.config.configMapRef.name and spec.prometheus.config.configMapRef.namespace to the name and namespace of the customized ConfigMap respectively.
  3. Check if TidbMonitor has enabled dynamic configuration. If not, you need to restart TidbMonitor’s pod to reload the configuration.

For the complete configuration, refer to the tidb-operator example.

Add extra options to the command

To add extra options to the command that starts Prometheus, configure spec.prometheus.config.commandOptions.

For the complete configuration, refer to the tidb-operator example.

Deploy Monitoring and Alerts for TiDB - 图2Note

The following options are automatically configured by the TidbMonitor controller and cannot be specified again via commandOptions.

  • config.file
  • log.level
  • web.enable-admin-api
  • web.enable-lifecycle
  • storage.tsdb.path
  • storage.tsdb.retention
  • storage.tsdb.max-block-duration
  • storage.tsdb.min-block-duration

Access the Grafana monitoring dashboard

You can run the kubectl port-forward command to access the Grafana monitoring dashboard:

  1. kubectl port-forward -n ${namespace} svc/${cluster_name}-grafana 3000:3000 &>/tmp/portforward-grafana.log &

Then open http://localhost:3000 in your browser and log on with the default username and password admin.

You can also set spec.grafana.service.type to NodePort or LoadBalancer, and then view the monitoring dashboard through NodePort or LoadBalancer.

If there is no need to use Grafana, you can delete the part of spec.grafana in TidbMonitor during deployment. In this case, you need to use other existing or newly deployed data visualization tools to directly access the monitoring data.

Access the Prometheus monitoring data

To access the monitoring data directly, run the kubectl port-forward command to access Prometheus:

  1. kubectl port-forward -n ${namespace} svc/${cluster_name}-prometheus 9090:9090 &>/tmp/portforward-prometheus.log &

Then open http://localhost:9090 in your browser or access this address via a client tool.

You can also set spec.prometheus.service.type to NodePort or LoadBalancer, and then view the monitoring data through NodePort or LoadBalancer.

Set kube-prometheus and AlertManager

Nodes-Info and Pods-Info monitoring dashboards are built into TidbMonitor Grafana by default to view the corresponding monitoring metrics of Kubernetes.

To view these monitoring metrics in TidbMonitor Grafana, take the following steps:

  1. Deploy Kubernetes cluster monitoring manually.

    There are multiple ways to deploy Kubernetes cluster monitoring. To use kube-prometheus for deployment, see the kube-prometheus documentation.

  2. Set the TidbMonitor.spec.kubePrometheusURL to obtain Kubernetes monitoring data.

Similarly, you can configure TidbMonitor to push the monitoring alert to AlertManager.

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: basic
  5. spec:
  6. clusters:
  7. - name: basic
  8. kubePrometheusURL: http://prometheus-k8s.monitoring:9090
  9. alertmanagerURL: alertmanager-main.monitoring:9093
  10. prometheus:
  11. baseImage: prom/prometheus
  12. version: v2.27.1
  13. service:
  14. type: NodePort
  15. grafana:
  16. baseImage: grafana/grafana
  17. version: 7.5.11
  18. service:
  19. type: NodePort
  20. initializer:
  21. baseImage: pingcap/tidb-monitor-initializer
  22. version: v5.4.0
  23. reloader:
  24. baseImage: pingcap/tidb-monitor-reloader
  25. version: v1.0.1
  26. prometheusReloader:
  27. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  28. version: v0.49.0
  29. imagePullPolicy: IfNotPresent

Enable Ingress

This section introduces how to enable Ingress for TidbMonitor. Ingress is an API object that exposes HTTP and HTTPS routes from outside the cluster to services within the cluster.

Prerequisites

Before using Ingress, you need to install the Ingress controller. Simply creating the Ingress resource does not take effect.

You need to deploy the NGINX Ingress controller, or choose from various Ingress controllers.

For more information, see Ingress Prerequisites.

Access TidbMonitor using Ingress

Currently, TidbMonitor provides a method to expose the Prometheus/Grafana service through Ingress. For details about Ingress, see Ingress official documentation.

The following example shows how to enable Prometheus and Grafana in TidbMonitor:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: ingress-demo
  5. spec:
  6. clusters:
  7. - name: demo
  8. persistent: false
  9. prometheus:
  10. baseImage: prom/prometheus
  11. version: v2.27.1
  12. ingress:
  13. hosts:
  14. - example.com
  15. annotations:
  16. foo: "bar"
  17. grafana:
  18. baseImage: grafana/grafana
  19. version: 7.5.11
  20. service:
  21. type: ClusterIP
  22. ingress:
  23. hosts:
  24. - example.com
  25. annotations:
  26. foo: "bar"
  27. initializer:
  28. baseImage: pingcap/tidb-monitor-initializer
  29. version: v5.4.0
  30. reloader:
  31. baseImage: pingcap/tidb-monitor-reloader
  32. version: v1.0.1
  33. prometheusReloader:
  34. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  35. version: v0.49.0
  36. imagePullPolicy: IfNotPresent

To modify the setting of Ingress Annotations, configure spec.prometheus.ingress.annotations and spec.grafana.ingress.annotations. If you use the default NGINX Ingress, see NGINX Ingress Controller Annotation for details.

The TidbMonitor Ingress setting also supports TLS. The following example shows how to configure TLS for Ingress. See Ingress TLS for details.

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: ingress-demo
  5. spec:
  6. clusters:
  7. - name: demo
  8. persistent: false
  9. prometheus:
  10. baseImage: prom/prometheus
  11. version: v2.27.1
  12. ingress:
  13. hosts:
  14. - example.com
  15. tls:
  16. - hosts:
  17. - example.com
  18. secretName: testsecret-tls
  19. grafana:
  20. baseImage: grafana/grafana
  21. version: 7.5.11
  22. service:
  23. type: ClusterIP
  24. initializer:
  25. baseImage: pingcap/tidb-monitor-initializer
  26. version: v5.4.0
  27. reloader:
  28. baseImage: pingcap/tidb-monitor-reloader
  29. version: v1.0.1
  30. prometheusReloader:
  31. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  32. version: v0.49.0
  33. imagePullPolicy: IfNotPresent

TLS Secret must include the tls.crt and tls.key keys, which include the certificate and private key used for TLS. For example:

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: testsecret-tls
  5. namespace: ${namespace}
  6. data:
  7. tls.crt: base64 encoded cert
  8. tls.key: base64 encoded key
  9. type: kubernetes.io/tls

In a public cloud-deployed Kubernetes cluster, you can usually configure Loadbalancer to access Ingress through a domain name. If you cannot configure the Loadbalancer service (for example, when you use NodePort as the service type of Ingress), you can access the service in a way equivalent to the following command:

  1. curl -H "Host: example.com" ${node_ip}:${NodePort}

Configure alert

When Prometheus is deployed with a TiDB cluster, some default alert rules are automatically imported. You can view all alert rules and statuses in the current system by accessing the Alerts page of Prometheus through a browser.

The custom configuration of alert rules is supported. You can modify the alert rules by taking the following steps:

  1. When deploying the monitoring system for the TiDB cluster, set spec.reloader.service.type to NodePort or LoadBalancer.
  2. Access the reloader service through NodePort or LoadBalancer. Click the Files button above to select the alert rule file to be modified, and make the custom configuration. Click Save after the modification.

The default Prometheus and alert configuration do not support sending alert messages. To send an alert message, you can integrate Prometheus with any tool that supports Prometheus alerts. It is recommended to manage and send alert messages via AlertManager.

  • If you already have an available AlertManager service in your existing infrastructure, you can set the value of spec.alertmanagerURL to the address of AlertManager, which will be used by Prometheus. For details, refer to Set kube-prometheus and AlertManager.

  • If no AlertManager service is available, or if you want to deploy a separate AlertManager service, refer to the Prometheus official document.

Monitor multiple clusters

Starting from TiDB Operator 1.2, TidbMonitor supports monitoring multiple clusters across namespaces.

Configure the monitoring of multiple clusters using YAML files

For the clusters to be monitored, regardless of whether TLS is enabled or not, you can monitor them by configuring TidbMonitor’s YAML file.

A configuration example is as follows:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: basic
  5. spec:
  6. clusters:
  7. - name: ns1
  8. namespace: ns1
  9. - name: ns2
  10. namespace: ns2
  11. persistent: true
  12. storage: 5G
  13. prometheus:
  14. baseImage: prom/prometheus
  15. version: v2.27.1
  16. service:
  17. type: NodePort
  18. grafana:
  19. baseImage: grafana/grafana
  20. version: 7.5.11
  21. service:
  22. type: NodePort
  23. initializer:
  24. baseImage: pingcap/tidb-monitor-initializer
  25. version: v5.4.0
  26. reloader:
  27. baseImage: pingcap/tidb-monitor-reloader
  28. version: v1.0.1
  29. prometheusReloader:
  30. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  31. version: v0.49.0
  32. imagePullPolicy: IfNotPresent

For a complete configuration example, refer to Example in the TiDB Operator repository.

Monitor multiple clusters using Grafana

If the tidb-monitor-initializer image is earlier than v4.0.14 or v5.0.3, to monitor multiple clusters, you can take the following steps in each Grafana Dashboard:

  1. On Grafana Dashboard, click Dashboard settings to open the Settings panel.
  2. On the Settings panel, select the tidb_cluster variable from Variables, and then set the Hide property of the tidb_cluster variable to the null option in the drop-down list.
  3. Get back to the current Grafana Dashboard (changes to the Hide property cannot be saved currently), and you can see the drop-down list for cluster selection. The cluster name in the drop-down list is in the ${namespace}-${name} format.

If you need to save changes to the Grafana Dashboard, Grafana must be 6.5 or later, and TiDB Operator must be v1.2.0-rc.2 or later.