Use Prometheus to monitor Karmada control plane

Prometheus, a Cloud Native Computing Foundation project, is a system and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts when specified conditions are observed.

This document gives an example to demonstrate how to use the Prometheus to monitor Karmada control plane.

Start up karmada clusters

You just need to clone Karmada repo, and run the following script in Karmada directory.

  1. hack/local-up-karmada.sh

Start Prometheus

  1. Create the resource of RBAC in both karmada-host context and karmada-apiserver context

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. name: monitor
    5. labels:
    6. name: monitor
    7. ---
    8. apiVersion: rbac.authorization.k8s.io/v1
    9. kind: ClusterRole
    10. metadata:
    11. name: prometheus
    12. rules:
    13. - apiGroups: [""]
    14. resources:
    15. - nodes
    16. - nodes/proxy
    17. - services
    18. - endpoints
    19. - pods
    20. verbs: ["get", "list", "watch"]
    21. - apiGroups:
    22. - extensions
    23. resources:
    24. - ingresses
    25. verbs: ["get", "list", "watch"]
    26. - nonResourceURLs: ["/metrics"]
    27. verbs: ["get"]
    28. - apiGroups:
    29. - 'cluster.karmada.io'
    30. resources:
    31. - '*'
    32. verbs:
    33. - '*'
    34. ---
    35. apiVersion: v1
    36. kind: ServiceAccount
    37. metadata:
    38. name: prometheus
    39. namespace: monitor
    40. ---
    41. apiVersion: rbac.authorization.k8s.io/v1
    42. kind: ClusterRoleBinding
    43. metadata:
    44. name: prometheus
    45. roleRef:
    46. apiGroup: rbac.authorization.k8s.io
    47. kind: ClusterRole
    48. name: prometheus
    49. subjects:
    50. - kind: ServiceAccount
    51. name: prometheus
    52. namespace: monitor
  2. Create Secret for ServiceAccount [need in k8s v1.24+] (Creating a ServiceAccount does not automatically generate Secret in v1.24+ )

    1. apiVersion: v1
    2. kind: Secret
    3. type: kubernetes.io/service-account-token
    4. metadata:
    5. name: prometheus
    6. namespace: monitor
    7. annotations:
    8. kubernetes.io/service-account.name: "prometheus"
  3. Get the token for accessing the karmada apiserver

    1. kubectl get secret prometheus -o=jsonpath={.data.token} -n monitor --context "karmada-apiserver" | base64 -d
  4. Create resource objects of Prometheus in context karmada-host, also you need replace <karmada-token> (2 places) with the token got from step 3

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: prometheus-config
    5. namespace: monitor
    6. data:
    7. prometheus.yml: |-
    8. global:
    9. scrape_interval: 15s
    10. evaluation_interval: 15s
    11. scrape_configs:
    12. - job_name: 'karmada-scheduler'
    13. kubernetes_sd_configs:
    14. - role: pod
    15. scheme: http
    16. tls_config:
    17. insecure_skip_verify: true
    18. relabel_configs:
    19. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
    20. action: keep
    21. regex: karmada-system;karmada-scheduler
    22. - target_label: __address__
    23. source_labels: [__address__]
    24. regex: '(.*)'
    25. replacement: '${1}:10351'
    26. action: replace
    27. - job_name: 'karmada-controller-manager'
    28. kubernetes_sd_configs:
    29. - role: pod
    30. scheme: http
    31. tls_config:
    32. insecure_skip_verify: true
    33. relabel_configs:
    34. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app]
    35. action: keep
    36. regex: karmada-system;karmada-controller-manager
    37. - target_label: __address__
    38. source_labels: [__address__]
    39. regex: '(.*)'
    40. replacement: '${1}:8080'
    41. action: replace
    42. - job_name: 'kubernetes-apiserver'
    43. kubernetes_sd_configs:
    44. - role: endpoints
    45. scheme: https
    46. tls_config:
    47. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    48. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    49. relabel_configs:
    50. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    51. action: keep
    52. regex: default;kubernetes;https
    53. - target_label: __address__
    54. replacement: kubernetes.default.svc:443
    55. - job_name: 'karmada-apiserver'
    56. kubernetes_sd_configs:
    57. - role: endpoints
    58. scheme: https
    59. tls_config:
    60. insecure_skip_verify: true
    61. bearer_token: <karmada-token> # need the true karmada token
    62. relabel_configs:
    63. - source_labels: [__meta_kubernetes_pod_label_app]
    64. action: keep
    65. regex: karmada-apiserver
    66. - target_label: __address__
    67. replacement: karmada-apiserver.karmada-system.svc:5443
    68. - job_name: 'karmada-aggregated-apiserver'
    69. kubernetes_sd_configs:
    70. - role: endpoints
    71. scheme: https
    72. tls_config:
    73. insecure_skip_verify: true
    74. bearer_token: <karmada-token> # need the true karmada token
    75. relabel_configs:
    76. - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoints_name]
    77. action: keep
    78. regex: karmada-system;karmada-aggregated-apiserver;karmada-aggregated-apiserver
    79. - target_label: __address__
    80. replacement: karmada-aggregated-apiserver.karmada-system.svc:443
    81. - job_name: 'kubernetes-cadvisor'
    82. scheme: https
    83. tls_config:
    84. ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    85. bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    86. kubernetes_sd_configs:
    87. - role: node
    88. relabel_configs:
    89. - target_label: __address__
    90. replacement: kubernetes.default.svc:443
    91. - source_labels: [__meta_kubernetes_node_name]
    92. regex: (.+)
    93. target_label: __metrics_path__
    94. replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    95. - action: labelmap
    96. regex: __meta_kubernetes_node_label_(.+)
    97. metric_relabel_configs:
    98. - action: replace
    99. source_labels: [id]
    100. regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
    101. target_label: rkt_container_name
    102. replacement: '${2}-${1}'
    103. - action: replace
    104. source_labels: [id]
    105. regex: '^/system\.slice/(.+)\.service$'
    106. target_label: systemd_service_name
    107. replacement: '${1}'
    108. - source_labels: [pod]
    109. separator: ;
    110. regex: (.+)
    111. target_label: pod_name
    112. replacement: $1
    113. action: replace
    114. ---
    115. apiVersion: v1
    116. kind: "Service"
    117. metadata:
    118. name: prometheus
    119. namespace: monitor
    120. labels:
    121. name: prometheus
    122. spec:
    123. ports:
    124. - name: prometheus
    125. protocol: TCP
    126. port: 9090
    127. targetPort: 9090
    128. nodePort: 31801
    129. selector:
    130. app: prometheus
    131. type: NodePort
    132. ---
    133. apiVersion: apps/v1
    134. kind: Deployment
    135. metadata:
    136. labels:
    137. name: prometheus
    138. name: prometheus
    139. namespace: monitor
    140. spec:
    141. replicas: 1
    142. selector:
    143. matchLabels:
    144. app: prometheus
    145. template:
    146. metadata:
    147. labels:
    148. app: prometheus
    149. spec:
    150. serviceAccountName: prometheus
    151. containers:
    152. - name: prometheus
    153. image: prom/prometheus:latest
    154. command:
    155. - "/bin/prometheus"
    156. args:
    157. - "--config.file=/etc/prometheus/prometheus.yml"
    158. - "--storage.tsdb.path=/prom-data"
    159. - "--storage.tsdb.retention.time=180d"
    160. ports:
    161. - containerPort: 9090
    162. protocol: TCP
    163. volumeMounts:
    164. - mountPath: "/etc/prometheus"
    165. name: prometheus-config
    166. - mountPath: "/prom-data"
    167. name: prom-data
    168. initContainers:
    169. - name: prometheus-data-permission-fix
    170. image: busybox
    171. command: ["/bin/chmod","-R","777", "/data"]
    172. volumeMounts:
    173. - name: prom-data
    174. mountPath: /data
    175. volumes:
    176. - name: prometheus-config
    177. configMap:
    178. name: prometheus-config
    179. - name: prom-data
    180. hostPath:
    181. path: /var/lib/prom-data
    182. type: DirectoryOrCreate
  5. Use any node IP of the control plane and the port number (default 31801) to enter the Prometheus monitoring page of the control plane

Visualizing metrics using Grafana

For a better experience with visual metrics, we can also use Grafana with Prometheus, as well as Dashboards provided by the community

  1. install grafana with helm

    1. helm repo add grafana https://grafana.github.io/helm-charts
    2. helm repo update
    3. cat <<EOF | helm upgrade --install grafana grafana/grafana --kube-context "karmada-host" -n monitor -f -
    4. persistence:
    5. enabled: true
    6. storageClassName: local-storage
    7. service:
    8. enabled: true
    9. type: NodePort
    10. nodePort: 31802
    11. targetPort: 3000
    12. port: 80
    13. EOF
  2. get the login password for grafana web UI

    1. kubectl get secret --namespace monitor grafana -o jsonpath="{.data.admin-password}" --context "karmada-host" | base64 --decode ; echo
  3. Use any node IP of the control plane and the port number (default 31802) to enter the grafana web UI of the control plane

    imag

Attention:

In k8s v1.24+, the metrics from cadvisor may miss image, name and container labels, this may cause the metrics of the karmada components (e.g karmada-apisever, kamada-controller-manager) to be unobserved link

Reference