监控K8s

Acknowledgement: grafana-agent is powered by Grafana Agent. Grafana Agent is a lightweight telemetry collector based on Prometheus that only performs its scraping and remote_write functions. Agent can also collect metrics, logs, and traces for storage in Grafana Cloud and Grafana Enterprise, as well as OSS deployments of Loki (logs), and Tempo (traces), Prometheus (metrics), and Cortex (metrics). Grafana Agent also contains several integrations (embedded metrics exporters) like node-exporter, a MySQL exporter, and many more.

The Grafana Agent uses the same code as Prometheus, but tackles these issues by only using the most relevant parts of Prometheus for interaction with hosted metrics:

  • Service Discovery
  • Scraping
  • Write Ahead Log (WAL)
  • Remote Write

对于Kubernetes集群及其上应用,我们推荐从以下几个方面,建立起完整的kubernetes指标监控体系:

前置依赖

  1. 如何在K8s中运行和启动grafana-agent,请参考在kubernetes中运行grafana-agent收集
  2. 推荐您以daemonset,在每个节点上启动一个grafana-agent实例。

通过kubelet来了解和监控k8s节点的基本运行状态数据

方案一:直接访问kubelet来获取节点状态指标数据

Kubelet组件运行在Kubernetes集群的各个节点中,其负责维护和管理节点上Pod的运行状态。kubelet组件的正常运行直接关系到该节点是否能够正常的被Kubernetes集群正常使用。

基于Prometheus在K8s环境下的服务发现能力,在Node模式,grafana-agent会自动发现Kubernetes中所有Node节点的信息并作为监控的目标Target。 而这些Target的访问地址实际上就是Kubelet的访问地址。

创建ConfigMap,其中包含grafana-agent的配置文件如下

  1. export NAMESPACE=default
  2. export CLUSTER_NAME=kubernetes
  3. export REMOTE_WRITE_URL=http://n9e-server:19000/prometheus/v1/write
  4. export REMOTE_WRITE_USERNAME=fc_laiwei
  5. export REMOTE_WRITE_PASSWORD=fc_laiweisecret
  6. cat <<EOF |
  7. kind: ConfigMap
  8. metadata:
  9. name: grafana-agent
  10. apiVersion: v1
  11. data:
  12. agent.yaml: |
  13. server:
  14. http_listen_port: 12345
  15. metrics:
  16. wal_directory: /tmp/grafana-agent-wal
  17. global:
  18. scrape_interval: 15s
  19. scrape_timeout: 10s
  20. external_labels:
  21. cluster: ${CLUSTER_NAME}
  22. configs:
  23. - name: fc_k8s_scrape
  24. remote_write:
  25. - url: ${REMOTE_WRITE_URL}
  26. basic_auth:
  27. username: ${REMOTE_WRITE_USERNAME}
  28. password: ${REMOTE_WRITE_PASSWORD}
  29. scrape_configs:
  30. - job_name: integrations/kubernetes/kubelet
  31. scheme: https
  32. tls_config:
  33. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  34. insecure_skip_verify: true
  35. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  36. kubernetes_sd_configs:
  37. - role: node
  38. relabel_configs:
  39. - action: labelmap
  40. regex: __meta_kubernetes_node_label_(.+)
  41. EOF
  42. envsubst | kubectl apply -n $NAMESPACE -f -

重建grafana-agent实例

  1. kubectl rollout restart daemonset/grafana-agent

这里使用Node模式自动发现集群中所有Kubelet作为监控的数据采集目标,同时通过labelmap步骤,将Node节点上的标签,作为样本的标签保存到时间序列当中。 重新加载grafana-agent的配置文件,并重建grafana-agent的Pod实例后,在nightingale dashboard中搜索{job="integrations/kubernetes/kubelet"},即可看到相应的时序数据了。

方案二:通过kube-apiserver提供的API间接获取kubelet的指标数据

不同于上面第一种方法,其直接通过kubelet的metrics服务采集监控数据,方法二通过Kubernetes的api-server提供的代理API访问各个节点中kubelet的metrics服务。

创建ConfigMap,其中包含grafana-agent的配置文件如下

  1. export NAMESPACE=default
  2. export CLUSTER_NAME=kubernetes
  3. export REMOTE_WRITE_URL=http://10.206.0.16:8480/insert/0/prometheus/api/v1/write
  4. export REMOTE_WRITE_URL=http://n9e-server:19000/prometheus/v1/write
  5. export REMOTE_WRITE_USERNAME=fc_laiwei
  6. export REMOTE_WRITE_PASSWORD=fc_laiweisecret
  7. cat <<EOF |
  8. kind: ConfigMap
  9. metadata:
  10. name: grafana-agent
  11. apiVersion: v1
  12. data:
  13. agent.yaml: |
  14. server:
  15. http_listen_port: 12345
  16. metrics:
  17. wal_directory: /tmp/grafana-agent-wal
  18. global:
  19. scrape_interval: 15s
  20. scrape_timeout: 10s
  21. external_labels:
  22. cluster: ${CLUSTER_NAME}
  23. configs:
  24. - name: fc_k8s_scrape
  25. remote_write:
  26. - url: ${REMOTE_WRITE_URL}
  27. basic_auth:
  28. username: ${REMOTE_WRITE_USERNAME}
  29. password: ${REMOTE_WRITE_PASSWORD}
  30. scrape_configs:
  31. - job_name: 'integrations/kubernetes/kubelet'
  32. scheme: https
  33. tls_config:
  34. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  35. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  36. kubernetes_sd_configs:
  37. - role: node
  38. relabel_configs:
  39. - action: labelmap
  40. regex: __meta_kubernetes_node_label_(.+)
  41. - target_label: __address__
  42. replacement: kubernetes.default.svc:443
  43. - source_labels: [__meta_kubernetes_node_name]
  44. regex: (.+)
  45. target_label: __metrics_path__
  46. replacement: /api/v1/nodes/\${1}/proxy/metrics
  47. EOF
  48. envsubst | kubectl apply -n $NAMESPACE -f -

通过relabeling,将从Kubernetes获取到的默认地址__address__替换为kubernetes.default.svc:443。同时将__metrics_path__替换为api-server的代理地址/api/v1/nodes/${1}/proxy/metrics

通过获取各个节点中kubelet的监控指标,您可以评估集群中各节点的性能表现。例如:

1. 通过指标kubelet_pod_start_duration_seconds可以获得当前节点中Pod启动时间相关的统计数据。

  1. kubelet_pod_start_duration_seconds{quantile="0.99"}

2. Pod平均启动时间(包含镜像下载时间):

  1. kubelet_pod_start_duration_seconds_sum / kubelet_pod_start_duration_seconds_count

除此以外,监控指标kubelet_docker_*还可以体现出kubelet与当前节点的docker服务的调用情况,从而可以反映出docker本身是否会影响kubelet的性能表现等问题。

通过cAdvisor来了解和监控节点中的容器运行状态

各节点的kubelet组件中除了包含自身的监控指标信息以外,kubelet组件还内置了对cAdvisor的支持。cAdvisor能够获取当前节点上运行的所有容器的资源使用情况,通过访问kubelet的/metrics/cadvisor地址可以获取到cadvisor的监控指标,因此和获取kubelet监控指标类似,这里同样通过node模式自动发现所有的kubelet信息,并通过适当的relabel过程,修改监控采集任务的配置。 与采集kubelet自身监控指标相似,这里也有两种方式采集cadvisor中的监控指标:

方案一:直接访问kubelet的/metrics/cadvisor地址,需要跳过ca证书认证

  1. export NAMESPACE=default
  2. export CLUSTER_NAME=kubernetes
  3. export REMOTE_WRITE_URL=http://10.206.0.16:8480/insert/0/prometheus/api/v1/write
  4. export REMOTE_WRITE_URL=http://n9e-server:19000/prometheus/v1/write
  5. export REMOTE_WRITE_USERNAME=fc_laiwei
  6. export REMOTE_WRITE_PASSWORD=fc_laiweisecret
  7. cat <<EOF |
  8. kind: ConfigMap
  9. metadata:
  10. name: grafana-agent
  11. apiVersion: v1
  12. data:
  13. agent.yaml: |
  14. server:
  15. http_listen_port: 12345
  16. metrics:
  17. wal_directory: /tmp/grafana-agent-wal
  18. global:
  19. scrape_interval: 15s
  20. scrape_timeout: 10s
  21. external_labels:
  22. cluster: ${CLUSTER_NAME}
  23. configs:
  24. - name: fc_k8s_scrape
  25. remote_write:
  26. - url: ${REMOTE_WRITE_URL}
  27. basic_auth:
  28. username: ${REMOTE_WRITE_USERNAME}
  29. password: ${REMOTE_WRITE_PASSWORD}
  30. scrape_configs:
  31. - job_name: 'integrations/kubernetes/cadvisor'
  32. scheme: https
  33. tls_config:
  34. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  35. insecure_skip_verify: true
  36. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  37. kubernetes_sd_configs:
  38. - role: node
  39. relabel_configs:
  40. - source_labels: [__meta_kubernetes_node_name]
  41. regex: (.+)
  42. target_label: __metrics_path__
  43. replacement: metrics/cadvisor
  44. - action: labelmap
  45. regex: __meta_kubernetes_node_label_(.+)
  46. EOF
  47. envsubst | kubectl apply -n $NAMESPACE -f -

方案二:通过api-server提供的代理地址访问kubelet的/metrics/cadvisor地址

  1. export NAMESPACE=default
  2. export CLUSTER_NAME=kubernetes
  3. export REMOTE_WRITE_URL=http://10.206.0.16:8480/insert/0/prometheus/api/v1/write
  4. export REMOTE_WRITE_URL=http://n9e-server:19000/prometheus/v1/write
  5. export REMOTE_WRITE_USERNAME=fc_laiwei
  6. export REMOTE_WRITE_PASSWORD=fc_laiweisecret
  7. cat <<EOF |
  8. kind: ConfigMap
  9. metadata:
  10. name: grafana-agent
  11. apiVersion: v1
  12. data:
  13. agent.yaml: |
  14. server:
  15. http_listen_port: 12345
  16. metrics:
  17. wal_directory: /tmp/grafana-agent-wal
  18. global:
  19. scrape_interval: 15s
  20. scrape_timeout: 10s
  21. external_labels:
  22. cluster: ${CLUSTER_NAME}
  23. configs:
  24. - name: fc_k8s_scrape
  25. remote_write:
  26. - url: ${REMOTE_WRITE_URL}
  27. basic_auth:
  28. username: ${REMOTE_WRITE_USERNAME}
  29. password: ${REMOTE_WRITE_PASSWORD}
  30. scrape_configs:
  31. - job_name: 'integrations/kubernetes/cadvisor'
  32. scheme: https
  33. tls_config:
  34. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  35. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  36. kubernetes_sd_configs:
  37. - role: node
  38. relabel_configs:
  39. - target_label: __address__
  40. replacement: kubernetes.default.svc:443
  41. - source_labels: [__meta_kubernetes_node_name]
  42. regex: (.+)
  43. target_label: __metrics_path__
  44. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  45. - action: labelmap
  46. regex: __meta_kubernetes_node_label_(.+)

使用NodeExporter监控节点资源使用情况

为了能够采集集群中各个节点的资源使用情况,我们可以借助grafana-agent内置的NodeExporter。具体的步骤可以参考:grafana-agent node_exporter

通过kube-apiserver来了解整个K8s集群的详细运行状态

kube-apiserver扮演了整个Kubernetes集群管理的入口的角色,负责对外暴露Kubernetes API。kube-apiserver组件一般是独立部署在集群外的,为了能够让部署在集群内的应用(kubernetes插件或者用户应用)能够与kube-apiserver交互,Kubernetes会默认在命名空间下创建一个名为kubernetes的服务,如下所示:

  1. $ kubectl get svc kubernetes -o wide
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
  3. kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 166d <none>

而该kubernetes服务代理的后端实际地址通过endpoints进行维护,如下所示:

  1. $ kubectl get endpoints kubernetes
  2. NAME ENDPOINTS AGE
  3. kubernetes 10.0.2.15:8443 166d

通过这种方式集群内的应用或者系统主机就可以通过集群内部的DNS域名kubernetes.default.svc访问到部署外部的kube-apiserver实例。

因此,如果我们想要监控kube-apiserver相关的指标,只需要通过endpoints资源找到kubernetes对应的所有后端地址即可。

如下所示,创建监控任务kubernetes-apiservers,这里指定了服务发现模式为endpoints。grafana-agent会查找当前集群中所有的endpoints配置,并通过relabel进行判断是否为apiserver对应的访问地址:

  1. - job_name: 'kubernetes-apiservers'
  2. kubernetes_sd_configs:
  3. - role: endpoints
  4. scheme: https
  5. tls_config:
  6. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  7. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  8. relabel_configs:
  9. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
  10. action: keep
  11. regex: default;kubernetes;https
  12. - target_label: __address__
  13. replacement: kubernetes.default.svc:443

在relabel_configs配置中第一步用于判断当前endpoints是否为kube-apiserver对用的地址。第二步,替换监控采集地址到kubernetes.default.svc:443即可。重新加载配置文件,重建grafana-agent实例,用以下promql {service="kubernetes", job="apiserver"}即可在nightingale dashboard中得到kube-apiserver相关的metrics数据。

通过BlackboxExporter了解和监控K8s集群中的网络连通状况

为了能够对Ingress和Service进行探测,我们需要在K8s集群部署Blackbox Exporter实例。 如下所示,创建blackbox-exporter.yaml用于描述部署相关的内容:

  1. cat << EOF |
  2. apiVersion: v1
  3. kind: Service
  4. metadata:
  5. labels:
  6. app: blackbox-exporter
  7. name: blackbox-exporter
  8. spec:
  9. ports:
  10. - name: blackbox
  11. port: 9115
  12. protocol: TCP
  13. selector:
  14. app: blackbox-exporter
  15. type: ClusterIP
  16. ---
  17. apiVersion: extensions/v1beta1
  18. kind: Deployment
  19. metadata:
  20. labels:
  21. app: blackbox-exporter
  22. name: blackbox-exporter
  23. spec:
  24. replicas: 1
  25. selector:
  26. matchLabels:
  27. app: blackbox-exporter
  28. template:
  29. metadata:
  30. labels:
  31. app: blackbox-exporter
  32. spec:
  33. containers:
  34. - image: prom/blackbox-exporter
  35. imagePullPolicy: IfNotPresent
  36. name: blackbox-exporter
  37. EOF
  38. kubectl apply -f -

通过以上命令,将在K8s集群中部署了一个Blackbox Exporter的Pod实例,同时通过服务blackbox-exporter在集群内暴露访问地址blackbox-exporter.default.svc.cluster.local,对于集群内的任意服务都可以通过该内部DNS域名访问Blackbox Exporter实例:

  1. $ kubectl get pods
  2. NAME READY STATUS RESTARTS AGE
  3. blackbox-exporter-f77fc78b6-72bl5 1/1 Running 0 4s
  4. $ kubectl get svc
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. blackbox-exporter ClusterIP 10.109.144.192 <none> 9115/TCP 3m

为了能够让grafana-agent能够自动的对Service进行探测,我们需要通过服务发现自动找到所有的Service信息。 如下所示,在grafana-agent的配置文件中添加名为kubernetes-services的监控采集任务:

  1. - job_name: 'kubernetes-services'
  2. metrics_path: /probe
  3. params:
  4. module: [http_2xx]
  5. kubernetes_sd_configs:
  6. - role: service
  7. relabel_configs:
  8. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
  9. action: keep
  10. regex: true
  11. - source_labels: [__address__]
  12. target_label: __param_target
  13. - target_label: __address__
  14. replacement: blackbox-exporter.default.svc.cluster.local:9115
  15. - source_labels: [__param_target]
  16. target_label: instance
  17. - action: labelmap
  18. regex: __meta_kubernetes_service_label_(.+)
  19. - source_labels: [__meta_kubernetes_namespace]
  20. target_label: kubernetes_namespace
  21. - source_labels: [__meta_kubernetes_service_name]
  22. target_label: kubernetes_name

在该任务配置中,通过指定kubernetes_sd_config的role为service指定服务发现模式:

  1. kubernetes_sd_configs:
  2. - role: service

为了区分集群中需要进行探测的Service实例,我们通过标签‘prometheus.io/probe: true’进行判断,从而过滤出需要探测的所有Service实例:

  1. - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
  2. action: keep
  3. regex: true

并且将通过服务发现获取到的Service实例地址__address__转换为获取监控数据的请求参数。同时将__address执行Blackbox Exporter实例的访问地址,并且重写了标签instance的内容:

  1. - source_labels: [__address__]
  2. target_label: __param_target
  3. - target_label: __address__
  4. replacement: blackbox-exporter.default.svc.cluster.local:9115
  5. - source_labels: [__param_target]
  6. target_label: instance

最后,为监控样本添加了额外的标签信息:

  1. - action: labelmap
  2. regex: __meta_kubernetes_service_label_(.+)
  3. - source_labels: [__meta_kubernetes_namespace]
  4. target_label: kubernetes_namespace
  5. - source_labels: [__meta_kubernetes_service_name]
  6. target_label: kubernetes_name

对于Ingress而言,也是一个相对类似的过程,这里给出对Ingress探测的grafana-agent任务配置作为参考:

  1. - job_name: 'kubernetes-ingresses'
  2. metrics_path: /probe
  3. params:
  4. module: [http_2xx]
  5. kubernetes_sd_configs:
  6. - role: ingress
  7. relabel_configs:
  8. - source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
  9. action: keep
  10. regex: true
  11. - source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
  12. regex: (.+);(.+);(.+)
  13. replacement: ${1}://${2}${3}
  14. target_label: __param_target
  15. - target_label: __address__
  16. replacement: blackbox-exporter.default.svc.cluster.local:9115
  17. - source_labels: [__param_target]
  18. target_label: instance
  19. - action: labelmap
  20. regex: __meta_kubernetes_ingress_label_(.+)
  21. - source_labels: [__meta_kubernetes_namespace]
  22. target_label: kubernetes_namespace
  23. - source_labels: [__meta_kubernetes_ingress_name]
  24. target_label: kubernetes_name

通过kube-state-metrics了解和监控K8s集群自身和应用的运行状态

kube-state-metrics重点回答以下方面的问题:

  • 我调度了多少个replicas?现在可用的有几个?
  • 多少个Pod是running/stopped/terminated状态?
  • Pod重启了多少次?
  • 我有多少job在运行中?

kube-state-metrics基于client-go开发,轮询Kubernetes API,并将Kubernetes的结构化信息转换为metrics。他所支持的指标包括:

  • CronJob Metrics
  • DaemonSet Metrics
  • Deployment Metrics
  • Job Metrics
  • LimitRange Metrics
  • Node Metrics
  • PersistentVolume Metrics
  • PersistentVolumeClaim Metrics
  • Pod Metrics
  • Pod Disruption Budget Metrics
  • ReplicaSet Metrics
  • ReplicationController Metrics
  • ResourceQuota Metrics
  • Service Metrics
  • StatefulSet Metrics
  • Namespace Metrics
  • Horizontal Pod Autoscaler Metrics
  • Endpoint Metrics
  • Secret Metrics
  • ConfigMap Metrics

以Pod为例:

  • kube_pod_info
  • kube_pod_owner
  • kube_pod_status_phase
  • kube_pod_status_ready
  • kube_pod_status_scheduled
  • kube_pod_container_status_waiting
  • kube_pod_container_status_terminated_reason

部署清单

  1. ├── cluster-role-binding.yaml
  2. ├── cluster-role.yaml
  3. ├── deployment.yaml
  4. ├── service-account.yaml
  5. ├── service.yaml

主要镜像有:

  • image: quay.io/coreos/kube-state-metrics:v2.4.2
  • image: k8s.gcr.io/kube-state-metrics/kube-state-metrics

由于 quay.io/coreos/kube-state-metrics 不再更新,推荐使用 k8s.gcr.io/kube-state-metrics/kube-state-metrics

quay.io/coreos/kube-state-metrics images will no longer be updated. k8s.gcr.io/kube-state-metrics/kube-state-metrics is the new canonical location.

对于pod的资源限制,一般情况下:

  1. 200MiB memory
  2. 0.1 cores

超过100节点的集群:

  1. 2MiB memory per node
  2. 0.001 cores per node

因为kube-state-metrics-service.yaml中有prometheus.io/scrape: 'true'标识,因此会将metric暴露给grafana-agent,而grafana-agent会在kubernetes-service-endpoints这个job下自动发现kube-state-metrics,并开始拉取metrics,无需其他配置。

使用kube-state-metrics后的常用场景有

  • 存在执行失败的Job: kube_job_status_failed{job=“kubernetes-service-endpoints”,k8s_app=“kube-state-metrics”}==1
  • 集群节点状态错误: kube_node_status_condition{condition=“Ready”,status!=“true”}==1
  • 集群中存在启动失败的Pod:kube_pod_status_phase{phase=~“Failed|Unknown”}==1
  • 最近30分钟内有Pod容器重启: changes(kube_pod_container_status_restarts[30m])>0

参考资料

Acknowledgement:本文档在yunlzheng 监控Kubernetes集群的基础上修改和补充而成,相关文字的版权归属原作者yunlzheng所有,并致以谢意。