Aggregate Monitoring Data of Multiple TiDB Clusters

This document describes how to aggregate the monitoring data of multiple TiDB clusters by Thanos to provide centralized monitoring service.

Thanos

Thanos is a high availability solution for Prometheus that simplifies the availability guarantee of Prometheus.

Thanos provides Thanos Query component as a unified query solution across multiple Prometheus clusters. You can use this feature to aggregate monitoring data of multiple TiDB clusters.

Aggregate monitoring data via Thanos Query

Configure Thanos Query

  1. Configure a Thanos Sidecar container for each TidbMonitor.

    A configuration example is as follows.

    1. kubectl -n ${namespace} apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/monitor-with-thanos/tidb-monitor.yaml
  2. Deploy the Thanos Query component.

    1. Download the thanos-query.yaml file for Thanos Query deployment:

      1. curl -sl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/monitor-with-thanos/thanos-query.yaml
    2. Manually modify the --store parameter in the thanos-query.yaml file by updating basic-prometheus:10901 to basic-prometheus.${namespace}:10901.

      ${namespace} is the namespace where TidbMonitor is deployed.

    3. Execute the kubectl apply command for deployment.

      1. kubectl -n ${thanos_namespace} apply -f thanos-query.yaml

      In the command above, ${thanos_namespace} is the namespace where the Thanos Query component is deployed.

In Thanos Query, a Prometheus instance corresponds to a store and also corresponds to a TidbMonitor. After deploying Thanos Query, you can provide a uniform query interface for monitoring data through Thanos Query’s API.

Access the Thanos Query panel

To access the Thanos Query panel, execute the following command, and then access http://127.0.0.1:9090 in your browser:

  1. kubectl port-forward -n ${thanos_namespace} svc/thanos-query 9090

If you want to access the Thanos Query panel using NodePort or LoadBalancer, refer to the following documents:

Configure Grafana

After deploying Thanos Query, to query the monitoring data of multiple TidbMonitors, take the following steps:

  1. Log in to Grafana.
  2. In the left navigation bar, select Configuration > Data Sources.
  3. Add or modify a DataSource in the Prometheus type.
  4. Set the URL under HTTP to http://thanos-query.${thanos_namespace}:9090.

Add or remove TidbMonitor

In Thanos Query, a Prometheus instance corresponds to a monitor store and also corresponds to a TidbMonitor. If you need to add, update, or remove a monitor store from the Thanos Query, update the --store configuration of the Thanos Query component, and perform a rolling update to the Thanos Query component.

  1. spec:
  2. containers:
  3. - args:
  4. - query
  5. - --grpc-address=0.0.0.0:10901
  6. - --http-address=0.0.0.0:9090
  7. - --log.level=debug
  8. - --log.format=logfmt
  9. - --query.replica-label=prometheus_replica
  10. - --query.replica-label=rule_replica
  11. - --store=<TidbMonitorName1>-prometheus.<TidbMonitorNs1>:10901
  12. - --store=<TidbMonitorName2>-prometheus.<TidbMonitorNs2>:10901

Configure archives and storage of Thanos Sidecar

Aggregate Monitoring Data of Multiple TiDB Clusters - 图1Note

To ensure successful configuration, you must first create the S3 bucket. If you choose AWS S3, refer to AWS documentation - Create AWS S3 Bucket and AWS documentation - AWS S3 Endpoint List for instructions.

Thanos Sidecar supports replicating monitoring data to S3 remote storage.

The configuration of the TidbMonitor CR is as follows:

  1. spec:
  2. thanos:
  3. baseImage: thanosio/thanos
  4. version: v0.17.2
  5. objectStorageConfig:
  6. key: objectstorage.yaml
  7. name: thanos-objectstorage

Meanwhile, you need to create a Secret. The example is as follows:

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: thanos-objectstorage
  5. type: Opaque
  6. stringData:
  7. objectstorage.yaml: |
  8. type: S3
  9. config:
  10. bucket: "xxxxxx"
  11. endpoint: "xxxx"
  12. region: ""
  13. access_key: "xxxx"
  14. insecure: true
  15. signature_version2: true
  16. secret_key: "xxxx"
  17. put_user_metadata: {}
  18. http_config:
  19. idle_conn_timeout: 90s
  20. response_header_timeout: 2m
  21. trace:
  22. enable: true
  23. part_size: 41943040

RemoteWrite mode

Besides aggregating data via Thanos Query, you can also push monitoring data to Thanos using Prometheus’ RemoteWrite feature.

To enable the RemoteWrite mode, specify the Prometheus RemoteWrite configuration when you create the TidbMonitor CR. For example:

  1. apiVersion: pingcap.com/v1alpha1
  2. kind: TidbMonitor
  3. metadata:
  4. name: basic
  5. spec:
  6. clusters:
  7. - name: basic
  8. prometheus:
  9. baseImage: prom/prometheus
  10. version: v2.27.1
  11. remoteWrite:
  12. - url: "http://thanos-receiver:19291/api/v1/receive"
  13. grafana:
  14. baseImage: grafana/grafana
  15. version: 7.5.11
  16. initializer:
  17. baseImage: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-initializer
  18. version: v5.4.0
  19. reloader:
  20. baseImage: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-reloader
  21. version: v1.0.1
  22. prometheusReloader:
  23. baseImage: quay.io/prometheus-operator/prometheus-config-reloader
  24. version: v0.49.0
  25. imagePullPolicy: IfNotPresent

After RemoteWrite is enabled, Prometheus pushes the monitoring data to Thanos Receiver. For more information, refer to the design of Thanos Receiver.

For details on the deployment, refer to this example of integrating TidbMonitor with Thanos Receiver.