Monitor

你能够通过不同的方式去监控一个 Pulsar 集群。可以通过主题使用相关的指标和集群每个组件的总体健康指标,来衡量集群是否健康。

指标采集

你能够采集 broker、Zookeeper、Bookeeper 的统计信息。

Borker 统计信息

You can collect Pulsar broker metrics from brokers and export the metrics in JSON format. The Pulsar broker metrics mainly have two types:

  • Destination dumps, which contain stats for each individual topic. You can fetch the destination dumps using the command below:

    1. bin/pulsar-admin broker-stats destinations
  • Broker metrics, which contain the broker information and topics stats aggregated at namespace level. You can fetch the broker metrics by using the following command:

    1. bin/pulsar-admin broker-stats monitoring-metrics

所有的指标都是每分钟更新一次。

The aggregated broker metrics are also exposed in the Prometheus format at:

  1. http://$BROKER_ADDRESS:8080/metrics

Zookeeper 统计信息

Pulsar 自带的本地 Zookeeper 、配置存储服务和客户端,都能够通过 Prometheus 公开详细的统计信息。

  1. http://$LOCAL_ZK_SERVER:8000/metrics
  2. http://$GLOBAL_ZK_SERVER:8001/metrics

本地 Zookeeper 集群的默认端口是8000,配置存储集群的默认端口是8001。 你能够通过修改配置项stats_server_port去改变本地 Zookeeper 和配置存储集群的默认端口。

Bookeeper 统计信息

你能够通过修改配置文件conf/bookkeeper.conf中的配置项statsProviderClass,来修改 Bookeeper 的统计框架。

The default BookKeeper configuration enables the Prometheus exporter. The configuration is included with Pulsar distribution.

  1. http://$BOOKIE_ADDRESS:8000/metrics

The default port for bookie is 8000. You can change the port by configuring prometheusStatsHttpPort in the conf/bookkeeper.conf file.

Managed cursor acknowledgment state

The acknowledgment state is persistent to the ledger first. When the acknowledgment state fails to be persistent to the ledger, they are persistent to ZooKeeper. To track the stats of acknowledgement, you can configure the metrics for the managed cursor.

  1. brk_ml_cursor_persistLedgerSucceed(namespace="", ledger_name="", cursor_name:"")
  2. brk_ml_cursor_persistLedgerErrors(namespace="", ledger_name="", cursor_name:"")
  3. brk_ml_cursor_persistZookeeperSucceed(namespace="", ledger_name="", cursor_name:"")
  4. brk_ml_cursor_persistZookeeperErrors(namespace="", ledger_name="", cursor_name:"")
  5. brk_ml_cursor_nonContiguousDeletedMessagesRange(namespace="", ledger_name="", cursor_name:"")

Those metrics are added in the Prometheus interface, you can monitor and check the metrics stats in the Grafana.

Function and connector stats

You can collect functions worker stats from functions-worker and export the metrics in JSON formats, which contain functions worker JVM metrics.

  1. pulsar-admin functions-worker monitoring-metrics

You can collect functions and connectors metrics from functions-worker and export the metrics in JSON formats.

  1. pulsar-admin functions-worker function-stats

The aggregated functions and connectors metrics can be exposed in Prometheus formats as below. You can get FUNCTIONS_WORKER_ADDRESS and WORKER_PORT from the functions_worker.yml file.

  1. http://$FUNCTIONS_WORKER_ADDRESS:$WORKER_PORT/metrics:

Prometheus 配置

你能够使用 prometheus 来采集 Pular 组件暴露出来的所有指标,并使用 Grafana 去展示这些指标。可以用这种方式来监控 Pulsar 集群。 For details, refer to Prometheus guide.

当 Pulsar 运行在裸机上时,你需要提供一个需要探测的节点列表。 当 Pulsar 运行在 Kubernetes 集群时,监控系统是自动启动的。 For details, refer to Kubernetes instructions.

监控面板

When you collect time series statistics, the major problem is to make sure the number of dimensions attached to the data does not explode. 因此,只需要按照命名空间维度去采集时序指标,再做聚合。

Pulsar 主题维度监控面板

Pulsar Manager 提供了主题维度的监控面板。

Grafana

你能够使用 grafana 创建一个监控面板,底层的数据来源是在 Prometheus 里面。

When you deploy Pulsar on Kubernetes, a pulsar-grafana Docker image is enabled by default. You can use the docker image with the principal dashboards.

Enter the command below to use the dashboard manually:

  1. docker run -p3000:3000 \
  2. -e PROMETHEUS_URL=http://$PROMETHEUS_HOST:9090/ \
  3. apachepulsar/pulsar-grafana:latest

The following are some Grafana dashboards examples:

  • pulsar-grafana: 当 Pulsar 集群运行在 Kubernetes 时,用来展示 Prometheus 采集的指标项的 Grafana 面板。
  • apache-pulsar-grafana-dashboard: 不同 Pulsar 组件的 Grafana 监控面板模板集合。运行在 Kubernetes 和 本地机器时都可以用。

告警规则

您能够通过 Pulsar 环境设置告警规则。 如果要设置 Apache Pulsar 的告警规则,你可以参考StreamNative 平台 的例子或者 Alert Manager 告警规则。