Exposing custom metrics for virtual machines

OKD includes a pre-configured, pre-installed, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.

In addition to using the OKD monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the node-exporter service.

Configuring the node exporter service

The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.

Prerequisites

  • Install the OKD CLI oc.

  • Log in to the cluster as a user with cluster-admin privileges.

  • Create the cluster-monitoring-config ConfigMap object in the openshift-monitoring project.

  • Configure the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project by setting enableUserWorkload to true.

Procedure

  1. Create the Service YAML file. In the following example, the file is called node-exporter-service.yaml.

    1. kind: Service
    2. apiVersion: v1
    3. metadata:
    4. name: node-exporter-service (1)
    5. namespace: dynamation (2)
    6. labels:
    7. servicetype: metrics (3)
    8. spec:
    9. ports:
    10. - name: exmet (4)
    11. protocol: TCP
    12. port: 9100 (5)
    13. targetPort: 9100 (6)
    14. type: ClusterIP
    15. selector:
    16. monitor: metrics (7)
    1The node-exporter service that exposes the metrics from the virtual machines.
    2The namespace where the service is created.
    3The label for the service. The ServiceMonitor uses this label to match this service.
    4The name given to the port that exposes metrics on port 9100 for the ClusterIP service.
    5The target port used by node-exporter-service to listen for requests.
    6The TCP port number of the virtual machine that is configured with the monitor label.
    7The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label monitor and a value of metrics will be matched.
  2. Create the node-exporter service:

    1. $ oc create -f node-exporter-service.yaml

Configuring a virtual machine with the node exporter service

Download the node-exporter file on to the virtual machine. Then, create a systemd service that runs the node-exporter service when the virtual machine boots.

Prerequisites

  • The pods for the component are running in the openshift-user-workload-monitoring project.

  • Grant the monitoring-edit role to users who need to monitor this user-defined project.

Procedure

  1. Log on to the virtual machine.

  2. Download the node-exporter file on to the virtual machine by using the directory path that applies to the version of node-exporter file.

    1. $ wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
  3. Extract the executable and place it in the /usr/bin directory.

    1. $ sudo tar xvf node_exporter-1.3.1.linux-amd64.tar.gz \
    2. --directory /usr/bin --strip 1 "*/node_exporter"
  4. Create a node_exporter.service file in this directory path: /etc/systemd/system. This systemd service file runs the node-exporter service when the virtual machine reboots.

    1. [Unit]
    2. Description=Prometheus Metrics Exporter
    3. After=network.target
    4. StartLimitIntervalSec=0
    5. [Service]
    6. Type=simple
    7. Restart=always
    8. RestartSec=1
    9. User=root
    10. ExecStart=/usr/bin/node_exporter
    11. [Install]
    12. WantedBy=multi-user.target
  5. Enable and start the systemd service.

    1. $ sudo systemctl enable node_exporter.service
    2. $ sudo systemctl start node_exporter.service

Verification

  • Verify that the node-exporter agent is reporting metrics from the virtual machine.

    1. $ curl http://localhost:9100/metrics

    Example output

    1. go_gc_duration_seconds{quantile="0"} 1.5244e-05
    2. go_gc_duration_seconds{quantile="0.25"} 3.0449e-05
    3. go_gc_duration_seconds{quantile="0.5"} 3.7913e-05

Creating a custom monitoring label for virtual machines

To enable queries to multiple virtual machines from a single service, add a custom label in the virtual machine’s YAML file.

Prerequisites

  • Install the OKD CLI oc.

  • Log in as a user with cluster-admin privileges.

  • Access to the web console for stop and restart a virtual machine.

Procedure

  1. Edit the template spec of your virtual machine configuration file. In this example, the label monitor has the value metrics.

    1. spec:
    2. template:
    3. metadata:
    4. labels:
    5. monitor: metrics
  2. Stop and restart the virtual machine to create a new pod with the label name given to the monitor label.

Querying the node-exporter service for metrics

Metrics are exposed for virtual machines through an HTTP service endpoint under the /metrics canonical name. When you query for metrics, Prometheus directly scrapes the metrics from the metrics endpoint exposed by the virtual machines and presents these metrics for viewing.

Prerequisites

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

Procedure

  1. Obtain the HTTP service endpoint by specifying the namespace for the service:

    1. $ oc get service -n <namespace> <node-exporter-service>
  2. To list all available metrics for the node-exporter service, query the metrics resource.

    1. $ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"

    Example output

    1. node_arp_entries{device="eth0"} 1
    2. node_boot_time_seconds 1.643153218e+09
    3. node_context_switches_total 4.4938158e+07
    4. node_cooling_device_cur_state{name="0",type="Processor"} 0
    5. node_cooling_device_max_state{name="0",type="Processor"} 0
    6. node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0
    7. node_cpu_guest_seconds_total{cpu="0",mode="user"} 0
    8. node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06
    9. node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61
    10. node_cpu_seconds_total{cpu="0",mode="irq"} 233.91
    11. node_cpu_seconds_total{cpu="0",mode="nice"} 551.47
    12. node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3
    13. node_cpu_seconds_total{cpu="0",mode="steal"} 86.12
    14. node_cpu_seconds_total{cpu="0",mode="system"} 464.15
    15. node_cpu_seconds_total{cpu="0",mode="user"} 1075.2
    16. node_disk_discard_time_seconds_total{device="vda"} 0
    17. node_disk_discard_time_seconds_total{device="vdb"} 0
    18. node_disk_discarded_sectors_total{device="vda"} 0
    19. node_disk_discarded_sectors_total{device="vdb"} 0
    20. node_disk_discards_completed_total{device="vda"} 0
    21. node_disk_discards_completed_total{device="vdb"} 0
    22. node_disk_discards_merged_total{device="vda"} 0
    23. node_disk_discards_merged_total{device="vdb"} 0
    24. node_disk_info{device="vda",major="252",minor="0"} 1
    25. node_disk_info{device="vdb",major="252",minor="16"} 1
    26. node_disk_io_now{device="vda"} 0
    27. node_disk_io_now{device="vdb"} 0
    28. node_disk_io_time_seconds_total{device="vda"} 174
    29. node_disk_io_time_seconds_total{device="vdb"} 0.054
    30. node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003
    31. node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039
    32. node_disk_read_bytes_total{device="vda"} 3.71867136e+08
    33. node_disk_read_bytes_total{device="vdb"} 366592
    34. node_disk_read_time_seconds_total{device="vda"} 19.128
    35. node_disk_read_time_seconds_total{device="vdb"} 0.039
    36. node_disk_reads_completed_total{device="vda"} 5619
    37. node_disk_reads_completed_total{device="vdb"} 96
    38. node_disk_reads_merged_total{device="vda"} 5
    39. node_disk_reads_merged_total{device="vdb"} 0
    40. node_disk_write_time_seconds_total{device="vda"} 240.66400000000002
    41. node_disk_write_time_seconds_total{device="vdb"} 0
    42. node_disk_writes_completed_total{device="vda"} 71584
    43. node_disk_writes_completed_total{device="vdb"} 0
    44. node_disk_writes_merged_total{device="vda"} 19761
    45. node_disk_writes_merged_total{device="vdb"} 0
    46. node_disk_written_bytes_total{device="vda"} 2.007924224e+09
    47. node_disk_written_bytes_total{device="vdb"} 0

Creating a ServiceMonitor resource for the node exporter service

You can use a Prometheus client library and scrape metrics from the /metrics endpoint to access and view the metrics exposed by the node-exporter service. Use a ServiceMonitor custom resource definition (CRD) to monitor the node exporter service.

Prerequisites

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

Procedure

  1. Create a YAML file for the ServiceMonitor resource configuration. In this example, the service monitor matches any service with the label metrics and queries the exmet port every 30 seconds.

    1. apiVersion: monitoring.coreos.com/v1
    2. kind: ServiceMonitor
    3. metadata:
    4. labels:
    5. k8s-app: node-exporter-metrics-monitor
    6. name: node-exporter-metrics-monitor (1)
    7. namespace: dynamation (2)
    8. spec:
    9. endpoints:
    10. - interval: 30s (3)
    11. port: exmet (4)
    12. scheme: http
    13. selector:
    14. matchLabels:
    15. servicetype: metrics
    1The name of the ServiceMonitor.
    2The namespace where the ServiceMonitor is created.
    3The interval at which the port will be queried.
    4The name of the port that is queried every 30 seconds
  2. Create the ServiceMonitor configuration for the node-exporter service.

    1. $ oc create -f node-exporter-metrics-monitor.yaml

Accessing the node exporter service outside the cluster

You can access the node-exporter service outside the cluster and view the exposed metrics.

Prerequisites

  • You have access to the cluster as a user with cluster-admin privileges or the monitoring-edit role.

  • You have enabled monitoring for the user-defined project by configuring the node-exporter service.

Procedure

  1. Expose the node-exporter service.

    1. $ oc expose service -n <namespace> <node_exporter_service_name>
  2. Obtain the FQDN (Fully Qualified Domain Name) for the route.

    1. $ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.host

    Example output

    1. NAME DNS
    2. node-exporter-service node-exporter-service-dynamation.apps.cluster.example.org
  3. Use the curl command to display metrics for the node-exporter service.

    1. $ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metrics

    Example output

    1. go_gc_duration_seconds{quantile="0"} 1.5382e-05
    2. go_gc_duration_seconds{quantile="0.25"} 3.1163e-05
    3. go_gc_duration_seconds{quantile="0.5"} 3.8546e-05
    4. go_gc_duration_seconds{quantile="0.75"} 4.9139e-05
    5. go_gc_duration_seconds{quantile="1"} 0.000189423

Additional resources