Managing metrics

You can collect metrics to monitor how cluster components and your own workloads are performing.

Understanding metrics

In OKD 4.12, cluster components are monitored by scraping metrics exposed through service endpoints. You can also configure metrics collection for user-defined projects.

You can define the metrics that you want to provide for your own workloads by using Prometheus client libraries at the application level.

In OKD, metrics are exposed through an HTTP service endpoint under the /metrics canonical name. You can list all available metrics for a service by running a curl query against http://<endpoint>/metrics. For instance, you can expose a route to the prometheus-example-app example service and then run the following to view all of its available metrics:

  1. $ curl http://<example_app_endpoint>/metrics

Example output

  1. # HELP http_requests_total Count of all HTTP requests
  2. # TYPE http_requests_total counter
  3. http_requests_total{code="200",method="get"} 4
  4. http_requests_total{code="404",method="get"} 2
  5. # HELP version Version information about this binary
  6. # TYPE version gauge
  7. version{version="v0.1.0"} 1

Additional resources

Setting up metrics collection for user-defined projects

You can create a ServiceMonitor resource to scrape metrics from a service endpoint in a user-defined project. This assumes that your application uses a Prometheus client library to expose metrics to the /metrics canonical name.

This section describes how to deploy a sample service in a user-defined project and then create a ServiceMonitor resource that defines how that service should be monitored.

Deploying a sample service

To test monitoring of a service in a user-defined project, you can deploy a sample service.

Procedure

  1. Create a YAML file for the service configuration. In this example, it is called prometheus-example-app.yaml.

  2. Add the following deployment and service configuration details to the file:

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. name: ns1
    5. ---
    6. apiVersion: apps/v1
    7. kind: Deployment
    8. metadata:
    9. labels:
    10. app: prometheus-example-app
    11. name: prometheus-example-app
    12. namespace: ns1
    13. spec:
    14. replicas: 1
    15. selector:
    16. matchLabels:
    17. app: prometheus-example-app
    18. template:
    19. metadata:
    20. labels:
    21. app: prometheus-example-app
    22. spec:
    23. containers:
    24. - image: ghcr.io/rhobs/prometheus-example-app:0.4.1
    25. imagePullPolicy: IfNotPresent
    26. name: prometheus-example-app
    27. ---
    28. apiVersion: v1
    29. kind: Service
    30. metadata:
    31. labels:
    32. app: prometheus-example-app
    33. name: prometheus-example-app
    34. namespace: ns1
    35. spec:
    36. ports:
    37. - port: 8080
    38. protocol: TCP
    39. targetPort: 8080
    40. name: web
    41. selector:
    42. app: prometheus-example-app
    43. type: ClusterIP

    This configuration deploys a service named prometheus-example-app in the user-defined ns1 project. This service exposes the custom version metric.

  3. Apply the configuration to the cluster:

    1. $ oc apply -f prometheus-example-app.yaml

    It takes some time to deploy the service.

  4. You can check that the pod is running:

    1. $ oc -n ns1 get pod

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. prometheus-example-app-7857545cb7-sbgwq 1/1 Running 0 81m

Specifying how a service is monitored

To use the metrics exposed by your service, you must configure OKD monitoring to scrape metrics from the /metrics endpoint. You can do this using a ServiceMonitor custom resource definition (CRD) that specifies how a service should be monitored, or a PodMonitor CRD that specifies how a pod should be monitored. The former requires a Service object, while the latter does not, allowing Prometheus to directly scrape metrics from the metrics endpoint exposed by a pod.

This procedure shows you how to create a ServiceMonitor resource for a service in a user-defined project.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role or the monitoring-edit role.

  • You have enabled monitoring for user-defined projects.

  • For this example, you have deployed the prometheus-example-app sample service in the ns1 project.

    The prometheus-example-app sample service does not support TLS authentication.

Procedure

  1. Create a YAML file for the ServiceMonitor resource configuration. In this example, the file is called example-app-service-monitor.yaml.

  2. Add the following ServiceMonitor resource configuration details:

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: ServiceMonitor
    3. metadata:
    4. labels:
    5. k8s-app: prometheus-example-monitor
    6. name: prometheus-example-monitor
    7. namespace: ns1
    8. spec:
    9. endpoints:
    10. - interval: 30s
    11. port: web
    12. scheme: http
    13. selector:
    14. matchLabels:
    15. app: prometheus-example-app

    This defines a ServiceMonitor resource that scrapes the metrics exposed by the prometheus-example-app sample service, which includes the version metric.

    A ServiceMonitor resource in a user-defined namespace can only discover services in the same namespace. That is, the namespaceSelector field of the ServiceMonitor resource is always ignored.

  3. Apply the configuration to the cluster:

    1. $ oc apply -f example-app-service-monitor.yaml

    It takes some time to deploy the ServiceMonitor resource.

  4. You can check that the ServiceMonitor resource is running:

    1. $ oc -n ns1 get servicemonitor

    Example output

    1. NAME AGE
    2. prometheus-example-monitor 81m

Additional resources

Viewing a list of available metrics

As a cluster administrator or as a user with view permissions for all projects, you can view a list of metrics available in a cluster and output the list in JSON format.

Prerequisites

  • You have installed the OKD CLI (oc).

  • You have obtained the OKD API route for Thanos Querier.

  • You are a cluster administrator, or you have access to the cluster as a user with the cluster-monitoring-view role.

    You can only use bearer token authentication to access the Thanos Querier API route.

Procedure

  1. If you have not obtained the OKD API route for Thanos Querier, run the following command:

    1. $ oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}'
  2. Retrieve a list of metrics in JSON format from the Thanos Querier API route by running the following command. This command uses oc to authenticate with a bearer token.

    1. $ curl -k -H "Authorization: Bearer $(oc whoami -t)" https://<thanos_querier_route>/api/v1/metadata (1)
    1Replace <thanos_querier_route> with the OKD API route for Thanos Querier.

Next steps