Monitoring virtual machine health

A virtual machine instance (VMI) can become unhealthy due to transient issues such as connectivity loss, deadlocks, or problems with external dependencies. A health check periodically performs diagnostics on a VMI by using any combination of the readiness and liveness probes.

About readiness and liveness probes

Use readiness and liveness probes to detect and handle unhealthy virtual machine instances (VMIs). You can include one or more probes in the specification of the VMI to ensure that traffic does not reach a VMI that is not ready for it and that a new instance is created when a VMI becomes unresponsive.

A readiness probe determines whether a VMI is ready to accept service requests. If the probe fails, the VMI is removed from the list of available endpoints until the VMI is ready.

A liveness probe determines whether a VMI is responsive. If the probe fails, the VMI is deleted and a new instance is created to restore responsiveness.

You can configure readiness and liveness probes by setting the spec.readinessProbe and the spec.livenessProbe fields of the VirtualMachineInstance object. These fields support the following tests:

HTTP GET

The probe determines the health of the VMI by using a web hook. The test is successful if the HTTP response code is between 200 and 399. You can use an HTTP GET test with applications that return HTTP status codes when they are completely initialized.

TCP socket

The probe attempts to open a socket to the VMI. The VMI is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.

Defining an HTTP readiness probe

Define an HTTP readiness probe by setting the spec.readinessProbe.httpGet field of the virtual machine instance (VMI) configuration.

Procedure

  1. Include details of the readiness probe in the VMI configuration file.

    Sample readiness probe with an HTTP GET test

    1. # ...
    2. spec:
    3. readinessProbe:
    4. httpGet: (1)
    5. port: 1500 (2)
    6. path: /healthz (3)
    7. httpHeaders:
    8. - name: Custom-Header
    9. value: Awesome
    10. initialDelaySeconds: 120 (4)
    11. periodSeconds: 20 (5)
    12. timeoutSeconds: 10 (6)
    13. failureThreshold: 3 (7)
    14. successThreshold: 3 (8)
    15. # ...
    1The HTTP GET request to perform to connect to the VMI.
    2The port of the VMI that the probe queries. In the above example, the probe queries port 1500.
    3The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is removed from the list of available endpoints.
    4The time, in seconds, after the VMI starts before the readiness probe is initiated.
    5The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    6The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
    7The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked Unready.
    8The number of times that the probe must report success, after a failure, to be considered successful. The default is 1.
  2. Create the VMI by running the following command:

    1. $ oc create -f <file_name>.yaml

Defining a TCP readiness probe

Define a TCP readiness probe by setting the spec.readinessProbe.tcpSocket field of the virtual machine instance (VMI) configuration.

Procedure

  1. Include details of the TCP readiness probe in the VMI configuration file.

    Sample readiness probe with a TCP socket test

    1. ...
    2. spec:
    3. readinessProbe:
    4. initialDelaySeconds: 120 (1)
    5. periodSeconds: 20 (2)
    6. tcpSocket: (3)
    7. port: 1500 (4)
    8. timeoutSeconds: 10 (5)
    9. ...
    1The time, in seconds, after the VMI starts before the readiness probe is initiated.
    2The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    3The TCP action to perform.
    4The port of the VMI that the probe queries.
    5The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
  2. Create the VMI by running the following command:

    1. $ oc create -f <file_name>.yaml

Defining an HTTP liveness probe

Define an HTTP liveness probe by setting the spec.livenessProbe.httpGet field of the virtual machine instance (VMI) configuration. You can define both HTTP and TCP tests for liveness probes in the same way as readiness probes. This procedure configures a sample liveness probe with an HTTP GET test.

Procedure

  1. Include details of the HTTP liveness probe in the VMI configuration file.

    Sample liveness probe with an HTTP GET test

    1. # ...
    2. spec:
    3. livenessProbe:
    4. initialDelaySeconds: 120 (1)
    5. periodSeconds: 20 (2)
    6. httpGet: (3)
    7. port: 1500 (4)
    8. path: /healthz (5)
    9. httpHeaders:
    10. - name: Custom-Header
    11. value: Awesome
    12. timeoutSeconds: 10 (6)
    13. # ...
    1The time, in seconds, after the VMI starts before the liveness probe is initiated.
    2The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than timeoutSeconds.
    3The HTTP GET request to perform to connect to the VMI.
    4The port of the VMI that the probe queries. In the above example, the probe queries port 1500. The VMI installs and runs a minimal HTTP server on port 1500 via cloud-init.
    5The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is deleted and a new instance is created.
    6The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than periodSeconds.
  2. Create the VMI by running the following command:

    1. $ oc create -f <file_name>.yaml

Template: Virtual machine configuration file for defining health checks

  1. apiVersion: kubevirt.io/v1
  2. kind: VirtualMachine
  3. metadata:
  4. labels:
  5. special: vm-fedora
  6. name: vm-fedora
  7. spec:
  8. template:
  9. metadata:
  10. labels:
  11. special: vm-fedora
  12. spec:
  13. domain:
  14. devices:
  15. disks:
  16. - disk:
  17. bus: virtio
  18. name: containerdisk
  19. - disk:
  20. bus: virtio
  21. name: cloudinitdisk
  22. resources:
  23. requests:
  24. memory: 1024M
  25. readinessProbe:
  26. httpGet:
  27. port: 1500
  28. initialDelaySeconds: 120
  29. periodSeconds: 20
  30. timeoutSeconds: 10
  31. failureThreshold: 3
  32. successThreshold: 3
  33. terminationGracePeriodSeconds: 0
  34. volumes:
  35. - name: containerdisk
  36. containerDisk:
  37. image: kubevirt/fedora-cloud-registry-disk-demo
  38. - cloudInitNoCloud:
  39. userData: |-
  40. #cloud-config
  41. password: fedora
  42. chpasswd: { expire: False }
  43. bootcmd:
  44. - setenforce 0
  45. - dnf install -y nmap-ncat
  46. - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
  47. name: cloudinitdisk

Additional resources