Process Exporter
grafana-agent内置集成了process-exporter,基于/proc的文件分析结果,来收集Linux系统进程相关的指标(注意,非Linux系统开启该exporter不起作用)。
如果grafana-agent运行在container中,那么在容器的启动命令中,要做以下调整,即将宿主机的/proc目录映射到容器中相应的位置。
docker run \-v "/proc:/proc:ro" \-v /tmp/agent:/etc/agent \-v /path/to/config.yaml:/etc/agent-config/agent.yaml \grafana/agent:v0.23.0 \--config.file=/etc/agent-config/agent.yaml
注意,将/path/to/config.yaml替换成您自己相应的配置文件。
如果grafana-agent运行在Kubernetes中,那么同样的需要在manifest文件中,做如下调整,即将宿主机的/proc目录映射到容器中相应的位置。
apiVersion: v1kind: Podmetadata:name: grafana-agentspec:containers:- image: grafana/agent:v0.23.0name: agentargs:- --config.file=/etc/agent-config/agent.yamlvolumeMounts:- name: procfsmountPath: /procreadOnly: truevolumes:- name: procfshostPath:path: /proc
配置并启用process_exporter
如下的配置,将会开启process_exporter,并追踪系统中的所有进程。
process_exporter:enabled: trueprocess_names:- name: "{{.Comm}}"cmdline:- '.+'
采集的指标列表
# Context switches# 上下文切换数量# Counternamedprocess_namegroup_context_switches_total# Cpu user/system usage in seconds# CPU 时间(秒)# Counternamedprocess_namegroup_cpu_seconds_total# Major page faults# 主要页缺失次数# Counternamedprocess_namegroup_major_page_faults_total# Minor page faults# 次要页缺失次数# Counternamedprocess_namegroup_minor_page_faults_total# number of bytes of memory in use# 内存占用(byte)# Gaugenamedprocess_namegroup_memory_bytes# number of processes in this group# 同名进程数量# Gaugenamedprocess_namegroup_num_procs# Number of processes in states Running, Sleeping, Waiting, Zombie, or Other# 同名进程状态分布# Gaugenamedprocess_namegroup_states# Number of threads# 线程数量# Gaugenamedprocess_namegroup_num_threads# start time in seconds since 1970/01/01 of oldest process in group# 启动时间戳# Gaugenamedprocess_namegroup_oldest_start_time_seconds# number of open file descriptors for this group# 打开文件描述符数量# Gaugenamedprocess_namegroup_open_filedesc# the worst (closest to 1) ratio between open fds and max fds among all procs in this group# 打开文件数 / 允许打开文件数# Gaugenamedprocess_namegroup_worst_fd_ratio# number of bytes read by this group# 读数据量(byte)# Counternamedprocess_namegroup_read_bytes_total# number of bytes written by this group# 写数据量(byte)# Counternamedprocess_namegroup_write_bytes_total# Number of threads in this group waiting on each wchan# 内核wchan等待线程数量# Gaugenamedprocess_namegroup_threads_wchan
process_exporter的详细配置项说明
# Enables the process_exporter integration, allowing the Agent to automatically# collect system metrics from the host UNIX system.[enabled: <boolean> | default = false]# Sets an explicit value for the instance label when the integration is# self-scraped. Overrides inferred values.## The default value for this integration is inferred from the agent hostname# and HTTP listen port, delimited by a colon.[instance: <string>]# Automatically collect metrics from this integration. If disabled,# the process_exporter integration will be run but not scraped and thus not# remote-written. Metrics for the integration will be exposed at# /integrations/process_exporter/metrics and can be scraped by an external# process.[scrape_integration: <boolean> | default = <integrations_config.scrape_integrations>]# How often should the metrics be collected? Defaults to# prometheus.global.scrape_interval.[scrape_interval: <duration> | default = <global_config.scrape_interval>]# The timeout before considering the scrape a failure. Defaults to# prometheus.global.scrape_timeout.[scrape_timeout: <duration> | default = <global_config.scrape_timeout>]# Allows for relabeling labels on the target.relabel_configs:[- <relabel_config> ... ]# Relabel metrics coming from the integration, allowing to drop series# from the integration that you don't care about.metric_relabel_configs:[ - <relabel_config> ... ]# How frequent to truncate the WAL for this integration.[wal_truncate_frequency: <duration> | default = "60m"]# procfs mountpoint.[procfs_path: <string> | default = "/proc"]# If a proc is tracked, track with it any children that aren't a part of their# own group.[track_children: <boolean> | default = true]# Report on per-threadname metrics as well.[track_threads: <boolean> | default = true]# Gather metrics from smaps file, which contains proportional resident memory# size.[gather_smaps: <boolean> | default = true]# Recheck process names on each scrape.[recheck_on_scrape: <boolean> | default = false]# A collection of matching rules to use for deciding which processes to# monitor. Each config can match multiple processes to be tracked as a single# process "group."process_names:[- <process_matcher_config>]
process_matcher_config
# The name to use for identifying the process group name in the metric. By# default, it uses the base path of the executable.## The following template variables are available:## - {{.Comm}}: Basename of the original executable from /proc/<pid>/stat# - {{.ExeBase}}: Basename of the executable from argv[0]# - {{.ExeFull}}: Fully qualified path of the executable# - {{.Username}}: Username of the effective user# - {{.Matches}}: Map containing all regex capture groups resulting from# matching a process with the cmdline rule group.# - {{.PID}}: PID of the process. Note that the PID is copied from the# first executable found.# - {{.StartTime}}: The start time of the process. This is useful when combined# with PID as PIDS get reused over time.[name: <string> | default = "{{.ExeBase}}"]# A list of strings that match the base executable name for a process, truncated# at 15 characters. It is derived from reading the second field of# /proc/<pid>/stat minus the parens.## If any of the strings match, the process will be tracked.comm:[- <string>]# A list of strings that match argv[0] for a process. If there are no slashes,# only the basename of argv[0] needs to match. Otherwise the name must be an# exact match. For example, "postgres" may match any postgres binary but# "/usr/local/bin/postgres" can only match a postgres at that path exactly.## If any of the strings match, the process will be tracked.exe:[- <string>]# A list of regular expressions applied to the argv of the process. Each# regex here must match the corresponding argv for the process to be tracked.# The first element that is matched is argv[1].## Regex Captures are added to the .Matches map for use in the name.cmdline:[- <string>]
