Detach CSI volumes after non-graceful node shutdown

This feature allows Container Storage Interface (CSI) drivers to automatically detach volumes when a node goes down non-gracefully.

Detach CSI volumes after non-graceful node shutdown is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Overview

A graceful node shutdown occurs when the kubelet’s node shutdown manager detects the upcoming node shutdown action. Non-graceful shutdowns occur when the kubelet does not detect a node shutdown action, which can occur because of system or hardware failures. Also, the kubelet may not detect a node shutdown action when the shutdown command does not trigger the Inhibitor Locks mechanism used by the kubelet on Linux, or because of a user error, for example, if the shutdownGracePeriod and shutdownGracePeriodCriticalPods details are not configured correctly for that node.

With this feature, when a non-graceful node shutdown occurs, you can manually add an out-of-service taint on the node to allow volumes to automatically detach from the node.

Adding an out-of-service taint manually for automatic volume detachment

Prerequisites

  • Access to the cluster with cluster-admin privileges.

Procedure

To allow volumes to detach automatically from a node after a non-graceful node shutdown:

  1. After a node is detected as unhealthy, shut down the worker node.

  2. Ensure that the node is shutdown by running the following command and checking the status:

    1. oc get node <node name> (1)
    1<node name> = name of the non-gracefully shutdown node

    If the node is not completely shut down, do not proceed with tainting the node. If the node is still up and the taint is applied, filesystem corruption can occur.

  3. Taint the corresponding node object by running the following command:

    1. oc adm taint node <node name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute (1)
    1<node name> = name of the non-gracefully shutdown node

    After the taint is applied, the volumes detach from the shutdown node allowing their disks to be attached to a different node.

    Example

    The resulting YAML file resembles the following:

    1. spec:
    2. taints:
    3. - effect: NoExecute
    4. key: node.kubernetes.io/out-of-service
    5. value: nodeshutdown
  4. Restart the node.

  5. Remove the taint.