Kafka Exporter

grafana-agent内置了kafka_exporter,来采集kafka的metrics指标。

我们强烈推荐您使用独立的账号运行grafana-agent,并做好访问kafka实例的最小化授权,避免过度授权带来的安全隐患,更多可以参考documentation

配置并启用kafka_exporter

  1. kafka_exporter:
  2. enabled: true
  3. # Address array (host:port) of Kafka server
  4. kafka_uris: ['xxx','yyy']

采集的关键指标列表

  1. kafka_brokers: count of kafka_brokers (gauge)
  2. kafka_topic_partitions: Number of partitions for this Topic (gauge)
  3. kafka_topic_partition_current_offset: Current Offset of a Broker at Topic/Partition (gauge)
  4. kafka_consumergroup_current_offset: Current Offset of a ConsumerGroup at Topic/Partition (gauge)
  5. kafka_consumer_lag_millis: Current approximation of consumer lag for a ConsumerGroup at Topic/Partition (gauge)
  6. kafka_topic_partition_under_replicated_partition: 1 if Topic/Partition is under Replicated

完整地配置项说明

  1. # Enables the kafka_exporter integration, allowing the Agent to automatically
  2. # collect system metrics from the configured dnsmasq server address
  3. [enabled: <boolean> | default = false]
  4. # Sets an explicit value for the instance label when the integration is
  5. # self-scraped. Overrides inferred values.
  6. #
  7. # The default value for this integration is inferred from the hostname
  8. # portion of the first kafka_uri value. If there is more than one string
  9. # in kafka_uri, the integration will fail to load and an instance value
  10. # must be manually provided.
  11. [instance: <string>]
  12. # Automatically collect metrics from this integration. If disabled,
  13. # the dnsmasq_exporter integration will be run but not scraped and thus not
  14. # remote-written. Metrics for the integration will be exposed at
  15. # /integrations/dnsmasq_exporter/metrics and can be scraped by an external
  16. # process.
  17. [scrape_integration: <boolean> | default = <integrations_config.scrape_integrations>]
  18. # How often should the metrics be collected? Defaults to
  19. # prometheus.global.scrape_interval.
  20. [scrape_interval: <duration> | default = <global_config.scrape_interval>]
  21. # The timeout before considering the scrape a failure. Defaults to
  22. # prometheus.global.scrape_timeout.
  23. [scrape_timeout: <duration> | default = <global_config.scrape_timeout>]
  24. # Allows for relabeling labels on the target.
  25. relabel_configs:
  26. [- <relabel_config> ... ]
  27. # Relabel metrics coming from the integration, allowing to drop series
  28. # from the integration that you don't care about.
  29. metric_relabel_configs:
  30. [ - <relabel_config> ... ]
  31. # How frequent to truncate the WAL for this integration.
  32. [wal_truncate_frequency: <duration> | default = "60m"]
  33. # Monitor the exporter itself and include those metrics in the results.
  34. [include_exporter_metrics: <bool> | default = false]
  35. # Address array (host:port) of Kafka server
  36. [kafka_uris: <[]string>]
  37. # Connect using SASL/PLAIN
  38. [use_sasl: <bool>]
  39. # Only set this to false if using a non-Kafka SASL proxy
  40. [use_sasl_handshake: <bool> | default = true]
  41. # SASL user name
  42. [sasl_username: <string>]
  43. # SASL user password
  44. [sasl_password: <string>]
  45. # The SASL SCRAM SHA algorithm sha256 or sha512 as mechanism
  46. [sasl_mechanism: <string>]
  47. # Connect using TLS
  48. [use_tls: <bool>]
  49. # The optional certificate authority file for TLS client authentication
  50. [ca_file: <string>]
  51. # The optional certificate file for TLS client authentication
  52. [cert_file: <string>]
  53. # The optional key file for TLS client authentication
  54. [key_file: <string>]
  55. # If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
  56. [insecure_skip_verify: <bool>]
  57. # Kafka broker version
  58. [kafka_version: <string> | default = "2.0.0"]
  59. # if you need to use a group from zookeeper
  60. [use_zookeeper_lag: <bool>]
  61. # Address array (hosts) of zookeeper server.
  62. [zookeeper_uris: <[]string>]
  63. # Kafka cluster name
  64. [kafka_cluster_name: <string>]
  65. # Metadata refresh interval
  66. [metadata_refresh_interval: <duration> | default = "1m"]
  67. # If true, all scrapes will trigger kafka operations otherwise, they will share results. WARN: This should be disabled on large clusters
  68. [allow_concurrency: <bool> | default = true]
  69. # Maximum number of offsets to store in the interpolation table for a partition
  70. [max_offsets: <int> | default = 1000]
  71. # How frequently should the interpolation table be pruned, in seconds
  72. [prune_interval_seconds: <int> | default = 30]
  73. # Regex filter for topics to be monitored
  74. [topics_filter_regex: <string> | default = ".*"]
  75. # Regex filter for consumer groups to be monitored
  76. [groups_filter_regex: <string> | default = ".*"]