Monitor Tool

Prometheus

The mapping from metric type to prometheus format

For metrics whose Metric Name is name and Tags are K1=V1, …, Kn=Vn, the mapping is as follows, where value is a
specific value

Metric TypeMapping
Countername_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
AutoGauge、Gaugename{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
Histogramname_max{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_sum{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_count{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, quantile=”0.5”} value
name{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, quantile=”0.99”} value
Ratename_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, rate=”m1”} value
name_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, rate=”m5”} value
name_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, rate=”m15”} value
name_total{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, rate=”mean”} value
Timername_seconds_max{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_seconds_sum{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_seconds_count{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”} value
name_seconds{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, quantile=”0.5”} value
name_seconds{cluster=”clusterName”, nodeType=”nodeType”, nodeId=”nodeId”, k1=”V1”, …, Kn=”Vn”, quantile=”0.99”} value

Config File

  1. Taking DataNode as an example, modify the iotdb-datanode.properties configuration file as follows:
  1. dn_metric_reporter_list=PROMETHEUS
  2. dn_metric_level=CORE
  3. dn_metric_prometheus_reporter_port=9091

Then you can get metrics data as follows

  1. Start IoTDB DataNodes
  2. Open a browser or use curl to visit http://servier_ip:9091/metrics, you can get the following metric
    data:
  1. ...
  2. # HELP file_count
  3. # TYPE file_count gauge
  4. file_count{name="wal",} 0.0
  5. file_count{name="unseq",} 0.0
  6. file_count{name="seq",} 2.0
  7. ...

Prometheus + Grafana

As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus
can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators.

The following picture describes the relationships among IoTDB, Prometheus and Grafana

iotdb_prometheus_grafana

iotdb_prometheus_grafana

  1. Along with running, IoTDB will collect its metrics continuously.
  2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured).
  3. Prometheus saves these metrics to its inner TSDB.
  4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the
    graph.

So, we need to do some additional works to configure and deploy Prometheus and Grafana.

For instance, you can config your Prometheus as follows to get metrics data from IoTDB:

  1. job_name: pull-metrics
  2. honor_labels: true
  3. honor_timestamps: true
  4. scrape_interval: 15s
  5. scrape_timeout: 10s
  6. metrics_path: /metrics
  7. scheme: http
  8. follow_redirects: true
  9. static_configs:
  10. - targets:
  11. - localhost:9091

The following documents may help you have a good journey with Prometheus and Grafana.

Prometheus getting_startedMonitor Tool - 图2open in new window

Prometheus scrape metricsMonitor Tool - 图3open in new window

Grafana getting_startedMonitor Tool - 图4open in new window

Grafana query metrics from PrometheusMonitor Tool - 图5open in new window

Apache IoTDB Dashboard

We introduce the Apache IoTDB Dashboard, designed for unified centralized operations and management. With it, multiple clusters can be monitored through a single panel.

Apache IoTDB Dashboard

Apache IoTDB Dashboard

Apache IoTDB Dashboard

Apache IoTDB Dashboard

You can access the Dashboard’s Json file in the enterprise edition.

Cluster Overview

Including but not limited to:

  • Total cluster CPU cores, memory space, and hard disk space.
  • Number of ConfigNodes and DataNodes in the cluster.
  • Cluster uptime duration.
  • Cluster write speed.
  • Current CPU, memory, and disk usage across all nodes in the cluster.
  • Information on individual nodes.

Monitor Tool - 图8

Data Writing

Including but not limited to:

  • Average write latency, median latency, and the 99% percentile latency.
  • Number and size of WAL files.
  • Node WAL flush SyncBuffer latency.

Monitor Tool - 图9

Data Querying

Including but not limited to:

  • Node query load times for time series metadata.
  • Node read duration for time series.
  • Node edit duration for time series metadata.
  • Node query load time for Chunk metadata list.
  • Node edit duration for Chunk metadata.
  • Node filtering duration based on Chunk metadata.
  • Average time to construct a Chunk Reader.

Monitor Tool - 图10

Storage Engine

Including but not limited to:

  • File count and sizes by type.
  • The count and size of TsFiles at various stages.
  • Number and duration of various tasks.

Monitor Tool - 图11

System Monitoring

Including but not limited to:

  • System memory, swap memory, and process memory.
  • Disk space, file count, and file sizes.
  • JVM GC time percentage, GC occurrences by type, GC volume, and heap memory usage across generations.
  • Network transmission rate, packet sending rate

Monitor Tool - 图12

Monitor Tool - 图13

Monitor Tool - 图14