Node Exporter

grafana-agent 内置了 node_exporter, 可以通过在配置文件中 integrations 部分定义 node_exporter_config 来开启该功能。

配置并启用node_exporter

下面是开启了node_exporter的配置文件示例,生成的配置文件保存为 ./grafana-agent-cfg.yaml:

  1. cat <<EOF > ./grafana-agent-cfg.yaml
  2. # grafana-agent 本身的配置
  3. server:
  4. log_level: info
  5. http_listen_port: 12345
  6. # grafana-agent 抓取 metrics 的相关配置(类似于prometheus的scrape_configs)
  7. metrics:
  8. global:
  9. scrape_interval: 15s
  10. scrape_timeout: 10s
  11. remote_write:
  12. - url: https://n9e-server:19000/prometheus/v1/write
  13. basic_auth:
  14. username: <string>
  15. password: <string>
  16. integrations:
  17. node_exporter:
  18. enabled: true
  19. EOF

注意: remote_write 可以配置在 global 部分,也可以针对每个 integration 单独配置不同的remote_write 地址。

重启grafana-agent后,通过以下两个命令,验证 node_exporter 工作是否符合预期。

curl http://localhost:12345/integrations/node_exporter/metrics ,预期输出如下内容:

  1. node_boot_time_seconds 1.643256088e+09
  2. node_context_switches_total 1.5136425575e+10
  3. node_cooling_device_cur_state{name="0",type="Processor"} 0
  4. node_cooling_device_cur_state{name="1",type="Processor"} 0
  5. node_cooling_device_cur_state{name="2",type="Processor"} 0
  6. node_cooling_device_cur_state{name="3",type="Processor"} 0
  7. node_cooling_device_max_state{name="0",type="Processor"} 0
  8. node_cooling_device_max_state{name="1",type="Processor"} 0
  9. node_cooling_device_max_state{name="2",type="Processor"} 0
  10. node_cooling_device_max_state{name="3",type="Processor"} 0
  11. node_cpu_seconds_total{cpu="0",mode="idle"} 1.66906519e+06
  12. node_cpu_seconds_total{cpu="0",mode="iowait"} 5031.48
  13. node_cpu_seconds_total{cpu="0",mode="irq"} 0
  14. node_cpu_seconds_total{cpu="0",mode="nice"} 82.84
  15. node_cpu_seconds_total{cpu="0",mode="softirq"} 2332.39

curl http://localhost:12345/agent/api/v1/targets | jq,预期输出如下内容:

  1. {
  2. "status": "success",
  3. "data": [
  4. {
  5. "instance": "b81030837ec7f1d162489cb4009325c9",
  6. "target_group": "integrations/node_exporter",
  7. "endpoint": "http://127.0.0.1:12345/integrations/node_exporter/metrics",
  8. "state": "up",
  9. "labels": {
  10. "agent_hostname": "tt-fc-dev01.nj",
  11. "instance": "tt-fc-dev01.nj:12345",
  12. "job": "integrations/node_exporter"
  13. },
  14. "discovered_labels": {
  15. "__address__": "127.0.0.1:12345",
  16. "__metrics_path__": "/integrations/node_exporter/metrics",
  17. "__scheme__": "http",
  18. "__scrape_interval__": "15s",
  19. "__scrape_timeout__": "10s",
  20. "agent_hostname": "tt-fc-dev01.nj",
  21. "job": "integrations/node_exporter"
  22. },
  23. "last_scrape": "2022-02-16T18:53:08.79288957+08:00",
  24. "scrape_duration_ms": 20,
  25. "scrape_error": ""
  26. },
  27. {
  28. "instance": "b81030837ec7f1d162489cb4009325c9",
  29. "target_group": "local_scrape",
  30. "endpoint": "http://127.0.0.1:12345/metrics",
  31. "state": "up",
  32. "labels": {
  33. "cluster": "txnjdev01",
  34. "instance": "127.0.0.1:12345",
  35. "job": "local_scrape"
  36. },
  37. "discovered_labels": {
  38. "__address__": "127.0.0.1:12345",
  39. "__metrics_path__": "/metrics",
  40. "__scheme__": "http",
  41. "__scrape_interval__": "15s",
  42. "__scrape_timeout__": "10s",
  43. "cluster": "txnjdev01",
  44. "job": "local_scrape"
  45. },
  46. "last_scrape": "2022-02-16T18:53:22.336820442+08:00",
  47. "scrape_duration_ms": 4,
  48. "scrape_error": ""
  49. }
  50. ]
  51. }

可以看到,上面的返回结果的 targets 列表中,已经新增了一个instance,其 job 为 integrations/node_exporter,这说明 node_exporter 已经在正常工作了。

注意:如果 grafana-agent 是运行在容器中时,那么要做以下修改调整:

  1. 确保在运行容器时,将宿主机的相关目录映射到容器中,如下所示,即 -v "/:/host/root"-v "/sys:/host/sys"-v "/proc:/host/proc".
  1. docker run \
  2. --net="host" \
  3. --pid="host" \
  4. --cap-add=SYS_TIME \
  5. -d \
  6. -v "/:/host/root:ro" \
  7. -v "/sys:/host/sys:ro" \
  8. -v "/proc:/host/proc:ro" \
  9. -v /tmp/grafana-agent:/etc/agent/data \
  10. -v /tmp/grafana-agent-config.yaml:/etc/agent/agent.yaml \
  11. grafana/agent:v0.23.0 \
  12. --config.file=/etc/agent/agent.yaml \
  13. --metrics.wal-directory=/etc/agent/data
  1. 其中,配置文件 /tmp/grafana-agent-config.yaml 中 node_exporter 部分要指定 rootfs/sysfs/procfs 在容器中的路径,您可以运行以下命令生成该测试配置文件(当然,您需要把 remote_write 替换为适合您的地址)。
  1. cat <<EOF > /tmp/grafana-agent-config.yaml
  2. server:
  3. log_level: info
  4. http_listen_port: 12345
  5. metrics:
  6. global:
  7. scrape_interval: 15s
  8. scrape_timeout: 10s
  9. remote_write:
  10. - url: https://n9e-server:19000/prometheus/v1/write
  11. basic_auth:
  12. username: <string>
  13. password: <string>
  14. integrations:
  15. node_exporter:
  16. enabled: true
  17. rootfs_path: /host/root
  18. sysfs_path: /host/sys
  19. procfs_path: /host/proc
  20. EOF

注意:如果 grafana-agent 是运行在 K8s 环境中,那么调整步骤如下:

  1. 推荐将 grafana-agent 的配置文件存储在configmap中, manifest文件如下:
  1. cat <<EOF |
  2. apiVersion: v1
  3. kind: ConfigMap
  4. metadata:
  5. name: grafana-agent
  6. namespace: ${NAMESPACE}
  7. data:
  8. agent.yaml: |
  9. server:
  10. log_level: info
  11. http_listen_port: 12345
  12. metrics:
  13. global:
  14. scrape_interval: 15s
  15. remote_write:
  16. - url: 'https://n9e-server:19000/prometheus/v1/write'
  17. basic_auth:
  18. username: ${FC_USERNAME}
  19. password: ${FC_PASSWORD}
  20. integrations:
  21. agent:
  22. enabled: true
  23. node_exporter:
  24. enabled: true
  25. EOF
  26. envsubst | kubectl apply -f -
  27. kubectl describe configmap grafana-agent
  1. 生成grafana-agent的pod manifest文件如下,并创建相应Pod实例:
  1. cat << EOF |
  2. apiVersion: v1
  3. kind: Pod
  4. metadata:
  5. name: grafana-agent
  6. namespace: ${NAMESPACE}
  7. spec:
  8. containers:
  9. - image: grafana/agent:v0.23.0
  10. name: grafana-agent
  11. args:
  12. - --config.file=/fcetc/agent.yaml
  13. - --metrics.wal-directory=/etc/agent/data
  14. securityContext:
  15. capabilities:
  16. add: ["SYS_TIME"]
  17. privileged: true
  18. runAsUser: 0
  19. volumeMounts:
  20. - name: rootfs
  21. mountPath: /host/root
  22. readOnly: true
  23. - name: sysfs
  24. mountPath: /host/sys
  25. readOnly: true
  26. - name: procfs
  27. mountPath: /host/proc
  28. readOnly: true
  29. - name: fccfg
  30. mountPath: /fcetc
  31. hostPID: true
  32. hostNetwork: true
  33. dnsPolicy: ClusterFirstWithHostNet
  34. volumes:
  35. - name: rootfs
  36. hostPath:
  37. path: /
  38. - name: sysfs
  39. hostPath:
  40. path: /sys
  41. - name: procfs
  42. hostPath:
  43. path: /proc
  44. - name: fccfg
  45. configMap:
  46. name: grafana-agent
  47. EOF
  48. envsubst |kubectl apply -f -
  49. kubectl logs grafana-agent #查看 grafana-agent 的日志

node_exporter采集的关键指标解析

  1. # SYSTEM
  2. # CPU context switch 次数
  3. node_context_switches_total: context_switches
  4. # Interrupts 次数
  5. node_intr_total: Interrupts
  6. # 运行的进程数
  7. node_procs_running: Processes in runnable state
  8. # 熵池大小
  9. node_entropy_available_bits: Entropy available to random number generators
  10. node_time_seconds: System time in seconds since epoch (1970)
  11. node_boot_time_seconds: Node boot time, in unixtime
  12. # CPU
  13. node_cpu_seconds_total: Seconds the CPUs spent in each mode
  14. node_load1: cpu load 1m
  15. node_load5: cpu load 5m
  16. node_load15: cpu load 15m
  17. # MEM
  18. # 内核态
  19. # 用户追踪已从交换区获取但尚未修改的页面的内存
  20. node_memory_SwapCached_bytes: Memory that keeps track of pages that have been fetched from swap but not yet been modified
  21. # 内核用于缓存数据结构供自己使用的内存
  22. node_memory_Slab_bytes: Memory used by the kernel to cache data structures for its own use
  23. # slab中可回收的部分
  24. node_memory_SReclaimable_bytes: SReclaimable - Part of Slab, that might be reclaimed, such as caches
  25. # slab中不可回收的部分
  26. node_memory_SUnreclaim_bytes: Part of Slab, that cannot be reclaimed on memory pressure
  27. # Vmalloc内存区的大小
  28. node_memory_VmallocTotal_bytes: Total size of vmalloc memory area
  29. # vmalloc已分配的内存,虚拟地址空间上的连续的内存
  30. node_memory_VmallocUsed_bytes: Amount of vmalloc area which is used
  31. # vmalloc区可用的连续最大快的大小,通过此指标可以知道vmalloc可分配连续内存的最大值
  32. node_memory_VmallocChunk_bytes: Largest contigious block of vmalloc area which is free
  33. # 内存的硬件故障删除掉的内存页的总大小
  34. node_memory_HardwareCorrupted_bytes: Amount of RAM that the kernel identified as corrupted / not working
  35. # 用于在虚拟和物理内存地址之间映射的内存
  36. node_memory_PageTables_bytes: Memory used to map between virtual and physical memory addresses (gauge)
  37. # 内核栈内存,常驻内存,不可回收
  38. node_memory_KernelStack_bytes: Kernel memory stack. This is not reclaimable
  39. # 用来访问高端内存,复制高端内存的临时buffer,称为“bounce buffering”,会降低I/O 性能
  40. node_memory_Bounce_bytes: Memory used for block device bounce buffers
  41. #用户态
  42. # 单个巨页大小
  43. node_memory_Hugepagesize_bytes: Huge Page size
  44. # 系统分配的常驻巨页数
  45. node_memory_HugePages_Total: Total size of the pool of huge pages
  46. # 系统空闲的巨页数
  47. node_memory_HugePages_Free: Huge pages in the pool that are not yet allocated
  48. # 进程已申请但未使用的巨页数
  49. node_memory_HugePages_Rsvd: Huge pages for which a commitment to allocate from the pool has been made, but no allocation
  50. # 超过系统设定的常驻HugePages数量的个数
  51. node_memory_HugePages_Surp: Huge pages in the pool above the value in /proc/sys/vm/nr_hugepages
  52. # 透明巨页 Transparent HugePages (THP)
  53. node_memory_AnonHugePages_bytes: Memory in anonymous huge pages
  54. # inactivelist中的File-backed内存
  55. node_memory_Inactive_file_bytes: File-backed memory on inactive LRU list
  56. # inactivelist中的Anonymous内存
  57. node_memory_Inactive_anon_bytes: Anonymous and swap cache on inactive LRU list, including tmpfs (shmem)
  58. # activelist中的File-backed内存
  59. node_memory_Active_file_bytes: File-backed memory on active LRU list
  60. # activelist中的Anonymous内存
  61. node_memory_Active_anon_bytes: Anonymous and swap cache on active least-recently-used (LRU) list, including tmpfs
  62. # 禁止换出的页,对应 Unevictable 链表
  63. node_memory_Unevictable_bytes: Amount of unevictable memory that can't be swapped out for a variety of reasons
  64. # 共享内存
  65. node_memory_Shmem_bytes: Used shared memory (shared between several processes, thus including RAM disks)
  66. # 匿名页内存大小
  67. node_memory_AnonPages_bytes: Memory in user pages not backed by files
  68. # 被关联的内存页大小
  69. node_memory_Mapped_bytes: Used memory in mapped pages files which have been mmaped, such as libraries
  70. # file-backed内存页缓存大小
  71. node_memory_Cached_bytes: Parked file data (file content) cache
  72. # 系统中有多少匿名页曾经被swap-out、现在又被swap-in并且swap-in之后页面中的内容一直没发生变化
  73. node_memory_SwapCached_bytes: Memory that keeps track of pages that have been fetched from swap but not yet been modified
  74. # 被mlock()系统调用锁定的内存大小
  75. node_memory_Mlocked_bytes: Size of pages locked to memory using the mlock() system call
  76. # 块设备(block device)所占用的缓存页
  77. node_memory_Buffers_bytes: Block device (e.g. harddisk) cache
  78. node_memory_SwapTotal_bytes: Memory information field SwapTotal_bytes
  79. node_memory_SwapFree_bytes: Memory information field SwapFree_bytes
  80. # DISK
  81. node_filesystem_files_free: Filesystem space available to non-root users in byte
  82. node_filesystem_free_bytes: Filesystem free space in bytes
  83. node_filesystem_size_bytes: Filesystem size in bytes
  84. node_filesystem_files_free: Filesystem total free file nodes
  85. node_filesystem_files: Filesystem total free file nodes
  86. node_filefd_maximum: Max open files
  87. node_filefd_allocated: Open files
  88. node_filesystem_readonly: Filesystem read-only status
  89. node_filesystem_device_error: Whether an error occurred while getting statistics for the given device
  90. node_disk_reads_completed_total: The total number of reads completed successfully
  91. node_disk_writes_completed_total: The total number of writes completed successfully
  92. node_disk_reads_merged_total: The number of reads merged
  93. node_disk_writes_merged_total: The number of writes merged
  94. node_disk_read_bytes_total: The total number of bytes read successfully
  95. node_disk_written_bytes_total: The total number of bytes written successfully
  96. node_disk_io_time_seconds_total: Total seconds spent doing I/Os
  97. node_disk_read_time_seconds_total: The total number of seconds spent by all reads
  98. node_disk_write_time_seconds_total: The total number of seconds spent by all writes
  99. node_disk_io_time_weighted_seconds_total: The weighted of seconds spent doing I/Os
  100. # NET
  101. node_network_receive_bytes_total: Network device statistic receive_bytes (counter)
  102. node_network_transmit_bytes_total: Network device statistic transmit_bytes (counter)
  103. node_network_receive_packets_total: Network device statistic receive_bytes
  104. node_network_transmit_packets_total: Network device statistic transmit_bytes
  105. node_network_receive_errs_total: Network device statistic receive_errs
  106. node_network_transmit_errs_total: Network device statistic transmit_errs
  107. node_network_receive_drop_total: Network device statistic receive_drop
  108. node_network_transmit_drop_total: Network device statistic transmit_drop
  109. node_nf_conntrack_entries: Number of currently allocated flow entries for connection tracking
  110. node_sockstat_TCP_alloc: Number of TCP sockets in state alloc
  111. node_sockstat_TCP_inuse: Number of TCP sockets in state inuse
  112. node_sockstat_TCP_orphan: Number of TCP sockets in state orphan
  113. node_sockstat_TCP_tw: Number of TCP sockets in state tw
  114. node_netstat_Tcp_CurrEstab: Statistic TcpCurrEstab
  115. node_sockstat_sockets_used: Number of IPv4 sockets in use

node_expoter integration 完整的配置项说明

  1. # Enables the node_exporter integration, allowing the Agent to automatically
  2. # collect system metrics from the host UNIX system.
  3. [enabled: <boolean> | default = false]
  4. # Sets an explicit value for the instance label when the integration is
  5. # self-scraped. Overrides inferred values.
  6. #
  7. # The default value for this integration is inferred from the agent hostname
  8. # and HTTP listen port, delimited by a colon.
  9. [instance: <string>]
  10. # Automatically collect metrics from this integration. If disabled,
  11. # the node_exporter integration will be run but not scraped and thus not remote-written. Metrics for the
  12. # integration will be exposed at /integrations/node_exporter/metrics and can
  13. # be scraped by an external process.
  14. [scrape_integration: <boolean> | default = <integrations_config.scrape_integrations>]
  15. # How often should the metrics be collected? Defaults to
  16. # prometheus.global.scrape_interval.
  17. [scrape_interval: <duration> | default = <global_config.scrape_interval>]
  18. # The timtout before considering the scrape a failure. Defaults to
  19. # prometheus.global.scrape_timeout.
  20. [scrape_timeout: <duration> | default = <global_config.scrape_timeout>]
  21. # Allows for relabeling labels on the target.
  22. relabel_configs:
  23. [- <relabel_config> ... ]
  24. # Relabel metrics coming from the integration, allowing to drop series
  25. # from the integration that you don't care about.
  26. metric_relabel_configs:
  27. [ - <relabel_config> ... ]
  28. # How frequent to truncate the WAL for this integration.
  29. [wal_truncate_frequency: <duration> | default = "60m"]
  30. # Monitor the exporter itself and include those metrics in the results.
  31. [include_exporter_metrics: <boolean> | default = false]
  32. # Optionally defines the the list of enabled-by-default collectors.
  33. # Anything not provided in the list below will be disabled by default,
  34. # but requires at least one element to be treated as defined.
  35. #
  36. # This is useful if you have a very explicit set of collectors you wish
  37. # to run.
  38. set_collectors:
  39. - [<string>]
  40. # Additional collectors to enable on top of the default set of enabled
  41. # collectors or on top of the list provided by set_collectors.
  42. #
  43. # This is useful if you have a few collectors you wish to run that are
  44. # not enabled by default, but do not want to explicitly provide an entire
  45. # list through set_collectors.
  46. enable_collectors:
  47. - [<string>]
  48. # Additional collectors to disable on top of the default set of disabled
  49. # collectors. Takes precedence over enable_collectors.
  50. #
  51. # This is useful if you have a few collectors you do not want to run that
  52. # are enabled by default, but do not want to explicitly provide an entire
  53. # list through set_collectors.
  54. disable_collectors:
  55. - [<string>]
  56. # procfs mountpoint.
  57. [procfs_path: <string> | default = "/proc"]
  58. # sysfs mountpoint.
  59. [sysfs_path: <string> | default = "/sys"]
  60. # rootfs mountpoint. If running in docker, the root filesystem of the host
  61. # machine should be mounted and this value should be changed to the mount
  62. # directory.
  63. [rootfs_path: <string> | default = "/"]
  64. # Expose expensive bcache priority stats.
  65. [enable_bcache_priority_stats: <boolean>]
  66. # Regexp of `bugs` field in cpu info to filter.
  67. [cpu_bugs_include: <string>]
  68. # Enable the node_cpu_guest_seconds_total metric.
  69. [enable_cpu_guest_seconds_metric: <boolean> | default = true]
  70. # Enable the cpu_info metric for the cpu collector.
  71. [enable_cpu_info_metric: <boolean> | default = true]
  72. # Regexp of `flags` field in cpu info to filter.
  73. [cpu_flags_include: <string>]
  74. # Regexmp of devices to ignore for diskstats.
  75. [diskstats_ignored_devices: <string> | default = "^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\\d+n\\d+p)\\d+$"]
  76. # Regexp of ethtool devices to exclude (mutually exclusive with ethtool_device_include)
  77. [ethtool_device_exclude: <string>]
  78. # Regexp of ethtool devices to include (mutually exclusive with ethtool_device_exclude)
  79. [ethtool_device_include: <string>]
  80. # Regexp of ethtool stats to include.
  81. [ethtool_metrics_include: <string> | default = ".*"]
  82. # Regexp of mount points to ignore for filesystem collector.
  83. [filesystem_mount_points_exclude: <string> | default = "^/(dev|proc|sys|var/lib/docker/.+)($|/)"]
  84. # Regexp of filesystem types to ignore for filesystem collector.
  85. [filesystem_fs_types_exclude: <string> | default = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"]
  86. # How long to wait for a mount to respond before marking it as stale.
  87. [filesystem_mount_timeout: <duration> | default = "5s"]
  88. # Array of IPVS backend stats labels.
  89. #
  90. # The default is [local_address, local_port, remote_address, remote_port, proto, local_mark].
  91. ipvs_backend_labels:
  92. [- <string>]
  93. # NTP server to use for ntp collector
  94. [ntp_server: <string> | default = "127.0.0.1"]
  95. # NTP protocol version
  96. [ntp_protocol_version: <int> | default = 4]
  97. # Certify that the server address is not a public ntp server.
  98. [ntp_server_is_local: <boolean> | default = false]
  99. # IP TTL to use wile sending NTP query.
  100. [ntp_ip_ttl: <int> | default = 1]
  101. # Max accumulated distance to the root.
  102. [ntp_max_distance: <duration> | default = "3466080us"]
  103. # Offset between local clock and local ntpd time to tolerate.
  104. [ntp_local_offset_tolerance: <duration> | default = "1ms"]
  105. # Regexp of net devices to ignore for netclass collector.
  106. [netclass_ignored_devices: <string> | default = "^$"]
  107. # Ignore net devices with invalid speed values. This will default to true in
  108. # node_exporter 2.0.
  109. [netclass_ignore_invalid_speed_device: <boolean> | default = false]
  110. # Enable collecting address-info for every device.
  111. [netdev_address_info: <boolean>]
  112. # Regexp of net devices to exclude (mutually exclusive with include)
  113. [netdev_device_exclude: <string> | default = ""]
  114. # Regexp of net devices to include (mutually exclusive with exclude)
  115. [netdev_device_include: <string> | default = ""]
  116. # Regexp of fields to return for netstat collector.
  117. [netstat_fields: <string> | default = "^(.*_(InErrors|InErrs)|Ip_Forwarding|Ip(6|Ext)_(InOctets|OutOctets)|Icmp6?_(InMsgs|OutMsgs)|TcpExt_(Listen.*|Syncookies.*|TCPSynRetrans|TCPTimeouts)|Tcp_(ActiveOpens|InSegs|OutSegs|OutRsts|PassiveOpens|RetransSegs|CurrEstab)|Udp6?_(InDatagrams|OutDatagrams|NoPorts|RcvbufErrors|SndbufErrors))$"]
  118. # List of CPUs from which perf metrics should be collected.
  119. [perf_cpus: <string> | default = ""]
  120. # Array of perf tracepoints that should be collected.
  121. perf_tracepoint:
  122. [- <string>]
  123. # Regexp of power supplies to ignore for the powersupplyclass collector.
  124. [powersupply_ignored_supplies: <string> | default = "^$"]
  125. # Path to runit service directory.
  126. [runit_service_dir: <string> | default = "/etc/service"]
  127. # XML RPC endpoint for the supervisord collector.
  128. #
  129. # Setting SUPERVISORD_URL in the environment will override the default value.
  130. # An explicit value in the YAML config takes precedence over the environment
  131. # variable.
  132. [supervisord_url: <string> | default = "http://localhost:9001/RPC2"]
  133. # Regexp of systemd units to include. Units must both match include and not
  134. # match exclude to be collected.
  135. [systemd_unit_include: <string> | default = ".+"]
  136. # Regexp of systemd units to exclude. Units must both match include and not
  137. # match exclude to be collected.
  138. [systemd_unit_exclude: <string> | default = ".+\\.(automount|device|mount|scope|slice)"]
  139. # Enables service unit tasks metrics unit_tasks_current and unit_tasks_max
  140. [systemd_enable_task_metrics: <boolean> | default = false]
  141. # Enables service unit metric service_restart_total
  142. [systemd_enable_restarts_metrics: <boolean> | default = false]
  143. # Enables service unit metric unit_start_time_seconds
  144. [systemd_enable_start_time_metrics: <boolean> | default = false]
  145. # Regexp of tapestats devices to ignore.
  146. [tapestats_ignored_devices: <string> | default = "^$"]
  147. # Directory to read *.prom files from for the textfile collector.
  148. [textfile_directory: <string> | default = ""]
  149. # Regexp of fields to return for the vmstat collector.
  150. [vmstat_fields: <string> | default = "^(oom_kill|pgpg|pswp|pg.*fault).*"]

node_exporter 自定义 collectors

您可以在 integrations node_export 配置中,通过设置和修改 set_collectors enable_collectors disable_collectors,以控制哪些 collector 生效。

  1. const (
  2. CollectorARP = "arp"
  3. CollectorBCache = "bcache"
  4. CollectorBTRFS = "btrfs"
  5. CollectorBonding = "bonding"
  6. CollectorBootTime = "boottime"
  7. CollectorBuddyInfo = "buddyinfo"
  8. CollectorCPU = "cpu"
  9. CollectorCPUFreq = "cpufreq"
  10. CollectorConntrack = "conntrack"
  11. CollectorDMI = "dmi"
  12. CollectorDRBD = "drbd"
  13. CollectorDRM = "drm"
  14. CollectorDevstat = "devstat"
  15. CollectorDiskstats = "diskstats"
  16. CollectorEDAC = "edac"
  17. CollectorEntropy = "entropy"
  18. CollectorEthtool = "ethtool"
  19. CollectorExec = "exec"
  20. CollectorFibrechannel = "fibrechannel"
  21. CollectorFileFD = "filefd"
  22. CollectorFilesystem = "filesystem"
  23. CollectorHWMon = "hwmon"
  24. CollectorIPVS = "ipvs"
  25. CollectorInfiniband = "infiniband"
  26. CollectorInterrupts = "interrupts"
  27. CollectorKSMD = "ksmd"
  28. CollectorLnstat = "lnstat"
  29. CollectorLoadAvg = "loadavg"
  30. CollectorLogind = "logind"
  31. CollectorMDADM = "mdadm"
  32. CollectorMeminfo = "meminfo"
  33. CollectorMeminfoNuma = "meminfo_numa"
  34. CollectorMountstats = "mountstats"
  35. CollectorNFS = "nfs"
  36. CollectorNFSD = "nfsd"
  37. CollectorNTP = "ntp"
  38. CollectorNVME = "nvme"
  39. CollectorNetclass = "netclass"
  40. CollectorNetdev = "netdev"
  41. CollectorNetstat = "netstat"
  42. CollectorNetworkRoute = "network_route"
  43. CollectorOS = "os"
  44. CollectorPerf = "perf"
  45. CollectorPowersuppply = "powersupplyclass"
  46. CollectorPressure = "pressure"
  47. CollectorProcesses = "processes"
  48. CollectorQDisc = "qdisc"
  49. CollectorRAPL = "rapl"
  50. CollectorRunit = "runit"
  51. CollectorSchedstat = "schedstat"
  52. CollectorSockstat = "sockstat"
  53. CollectorSoftnet = "softnet"
  54. CollectorStat = "stat"
  55. CollectorSupervisord = "supervisord"
  56. CollectorSystemd = "systemd"
  57. CollectorTCPStat = "tcpstat"
  58. CollectorTapestats = "tapestats"
  59. CollectorTextfile = "textfile"
  60. CollectorThermal = "thermal"
  61. CollectorThermalzone = "thermal_zone"
  62. CollectorTime = "time"
  63. CollectorTimex = "timex"
  64. CollectorUDPQueues = "udp_queues"
  65. CollectorUname = "uname"
  66. CollectorVMStat = "vmstat"
  67. CollectorWiFi = "wifi"
  68. CollectorXFS = "xfs"
  69. CollectorZFS = "zfs"
  70. CollectorZoneinfo = "zoneinfo"
  71. )