Monitoring virtual machine health

Monitoring virtual machine health

About readiness and liveness probes
Defining an HTTP readiness probe
Defining a TCP readiness probe
Defining an HTTP liveness probe
Template: Virtual machine configuration file for defining health checks
Additional resources

A virtual machine instance (VMI) can become unhealthy due to transient issues such as connectivity loss, deadlocks, or problems with external dependencies. A health check periodically performs diagnostics on a VMI by using any combination of the readiness and liveness probes.

About readiness and liveness probes

Use readiness and liveness probes to detect and handle unhealthy virtual machine instances (VMIs). You can include one or more probes in the specification of the VMI to ensure that traffic does not reach a VMI that is not ready for it and that a new instance is created when a VMI becomes unresponsive.

A readiness probe determines whether a VMI is ready to accept service requests. If the probe fails, the VMI is removed from the list of available endpoints until the VMI is ready.

A liveness probe determines whether a VMI is responsive. If the probe fails, the VMI is deleted and a new instance is created to restore responsiveness.

You can configure readiness and liveness probes by setting the spec.readinessProbe and the spec.livenessProbe fields of the VirtualMachineInstance object. These fields support the following tests:

HTTP GET

The probe determines the health of the VMI by using a web hook. The test is successful if the HTTP response code is between 200 and 399. You can use an HTTP GET test with applications that return HTTP status codes when they are completely initialized.

TCP socket

The probe attempts to open a socket to the VMI. The VMI is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.

Defining an HTTP readiness probe

Define an HTTP readiness probe by setting the spec.readinessProbe.httpGet field of the virtual machine instance (VMI) configuration.

Procedure

Include details of the readiness probe in the VMI configuration file.

Sample readiness probe with an HTTP GET test

# ...
spec:
  readinessProbe:
    httpGet: (1)
      port: 1500 (2)
      path: /healthz (3)
      httpHeaders:
      - name: Custom-Header
        value: Awesome
    initialDelaySeconds: 120 (4)
    periodSeconds: 20 (5)
    timeoutSeconds: 10 (6)
    failureThreshold: 3 (7)
    successThreshold: 3 (8)
# ...

1	The HTTP GET request to perform to connect to the VMI.
2	The port of the VMI that the probe queries. In the above example, the probe queries port 1500.
3	The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is removed from the list of available endpoints.
4	The time, in seconds, after the VMI starts before the readiness probe is initiated.
5	The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than `timeoutSeconds`.
6	The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than `periodSeconds`.
7	The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked `Unready`.
8	The number of times that the probe must report success, after a failure, to be considered successful. The default is 1.

Create the VMI by running the following command:
```
$ oc create -f <file_name>.yaml
```

Defining a TCP readiness probe

Define a TCP readiness probe by setting the spec.readinessProbe.tcpSocket field of the virtual machine instance (VMI) configuration.

Procedure

Include details of the TCP readiness probe in the VMI configuration file.

Sample readiness probe with a TCP socket test

...
spec:
  readinessProbe:
    initialDelaySeconds: 120 (1)
    periodSeconds: 20 (2)
    tcpSocket: (3)
      port: 1500 (4)
    timeoutSeconds: 10 (5)
...

1	The time, in seconds, after the VMI starts before the readiness probe is initiated.
2	The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than `timeoutSeconds`.
3	The TCP action to perform.
4	The port of the VMI that the probe queries.
5	The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than `periodSeconds`.

Create the VMI by running the following command:
```
$ oc create -f <file_name>.yaml
```

Defining an HTTP liveness probe

Define an HTTP liveness probe by setting the spec.livenessProbe.httpGet field of the virtual machine instance (VMI) configuration. You can define both HTTP and TCP tests for liveness probes in the same way as readiness probes. This procedure configures a sample liveness probe with an HTTP GET test.

Procedure

Include details of the HTTP liveness probe in the VMI configuration file.

Sample liveness probe with an HTTP GET test

# ...
spec:
  livenessProbe:
    initialDelaySeconds: 120 (1)
    periodSeconds: 20 (2)
    httpGet: (3)
      port: 1500 (4)
      path: /healthz (5)
      httpHeaders:
      - name: Custom-Header
        value: Awesome
    timeoutSeconds: 10 (6)
# ...

1	The time, in seconds, after the VMI starts before the liveness probe is initiated.
2	The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than `timeoutSeconds`.
3	The HTTP GET request to perform to connect to the VMI.
4	The port of the VMI that the probe queries. In the above example, the probe queries port 1500. The VMI installs and runs a minimal HTTP server on port 1500 via cloud-init.
5	The path to access on the HTTP server. In the above example, if the handler for the server’s `/healthz` path returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is deleted and a new instance is created.
6	The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than `periodSeconds`.

Create the VMI by running the following command:
```
$ oc create -f <file_name>.yaml
```

Template: Virtual machine configuration file for defining health checks

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    special: vm-fedora
  name: vm-fedora
spec:
  template:
    metadata:
      labels:
        special: vm-fedora
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
        resources:
          requests:
            memory: 1024M
      readinessProbe:
        httpGet:
          port: 1500
        initialDelaySeconds: 120
        periodSeconds: 20
        timeoutSeconds: 10
        failureThreshold: 3
        successThreshold: 3
      terminationGracePeriodSeconds: 0
      volumes:
      - name: containerdisk
        containerDisk:
          image: kubevirt/fedora-cloud-registry-disk-demo
      - cloudInitNoCloud:
          userData: |-
            #cloud-config
            password: fedora
            chpasswd: { expire: False }
            bootcmd:
              - setenforce 0
              - dnf install -y nmap-ncat
              - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
        name: cloudinitdisk

Additional resources

Monitoring application health by using health checks