Using the Node Maintenance Operator to place nodes in maintenance mode
You can use the Node Maintenance Operator to place nodes in maintenance mode. This is a standalone version of the Node Maintenance Operator that is independent of OKD Virtualization installation.
If you have installed OKD Virtualization, you must use the Node Maintenance Operator that is bundled with it. |
About the Node Maintenance Operator
You can place nodes into maintenance mode using the oc adm
utility, or using NodeMaintenance
custom resources (CRs).
The Node Maintenance Operator watches for new or deleted NodeMaintenance
CRs. When a new NodeMaintenance
CR is detected, no new workloads are scheduled and the node is cordoned off from the rest of the cluster. All pods that can be evicted are evicted from the node. When a NodeMaintenance
CR is deleted, the node that is referenced in the CR is made available for new workloads.
Using a |
Maintaining bare-metal nodes
When you deploy OKD on bare-metal infrastructure, you must take additional considerations into account compared to deploying on cloud infrastructure. Unlike in cloud environments, where the cluster nodes are considered ephemeral, reprovisioning a bare-metal node requires significantly more time and effort for maintenance tasks.
When a bare-metal node fails due to a kernel error or a NIC card hardware failure, workloads on the failed node need to be restarted on another node in the cluster while the problem node is repaired or replaced. Node maintenance mode allows cluster administrators to gracefully turn-off nodes, move workloads to other parts of the cluster, and ensure that workloads do not get interrupted. Detailed progress and node status details are provided during maintenance.
Installing the Node Maintenance Operator
You can install the Node Maintenance Operator using the web console or the OpenShift CLI (oc
).
Installing the Node Maintenance Operator by using the web console
You can use the OKD web console to install the Node Maintenance Operator.
Prerequisites
- Log in as a user with
cluster-admin
privileges.
Procedure
In the OKD web console, navigate to Operators → OperatorHub.
Search for the Node Maintenance Operator, then click Install.
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-operators
namespace.Click Install.
Verification
To confirm that the installation is successful:
Navigate to the Operators → Installed Operators page.
Check that the Operator is installed in the
openshift-operators
namespace and that its status isSucceeded
.
If the Operator is not installed successfully:
Navigate to the Operators → Installed Operators page and inspect the
Status
column for any errors or failures.Navigate to the Workloads → Pods page and check the logs in any pods in the
openshift-operators
project that are reporting issues.
Installing the Node Maintenance Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the Node Maintenance Operator.
You can install the Node Maintenance Operator in your own namespace or in the openshift-operators
namespace.
To install the Operator in your own namespace, follow the steps in the procedure.
To install the Operator in the openshift-operators
namespace, skip to step 3 of the procedure because the steps to create a new Namespace
custom resource (CR) and an OperatorGroup
CR are not required.
Prerequisites
Install the OpenShift CLI (
oc
).Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
CR for the Node Maintenance Operator:Define the
Namespace
CR and save the YAML file, for example,node-maintenance-namespace.yaml
:apiVersion: v1
kind: Namespace
metadata:
name: nmo-test
To create the
Namespace
CR, run the following command:$ oc create -f node-maintenance-namespace.yaml
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,node-maintenance-operator-group.yaml
:apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: node-maintenance-operator
namespace: nmo-test
To create the
OperatorGroup
CR, run the following command:$ oc create -f node-maintenance-operator-group.yaml
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,node-maintenance-subscription.yaml
:apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: node-maintenance-operator
namespace: nmo-test (1)
spec:
channel: stable
InstallPlaneApproval: Automatic
name: node-maintenance-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
StartingCSV: node-maintenance-operator.v4.10.0
1 Specify the Namespace
where you want to install the Node Maintenance Operator.To install the Node Maintenance Operator in the
openshift-operators
namespace, specifyopenshift-operators
in theSubscription
CR.To create the
Subscription
CR, run the following command:$ oc create -f node-maintenance-subscription.yaml
Verification
Verify that the installation succeeded by inspecting the CSV resource:
$ oc get csv -n openshift-operators
Example output
NAME DISPLAY VERSION REPLACES PHASE
node-maintenance-operator.v4.10 Node Maintenance Operator 4.10 Succeeded
Verify that the Node Maintenance Operator is running:
$ oc get deploy -n openshift-operators
Example output
NAME READY UP-TO-DATE AVAILABLE AGE
node-maintenance-operator-controller-manager 1/1 1 1 10d
The Node Maintenance Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.
Setting a node to maintenance mode
You can place a node into maintenance from the web console or in the CLI by using a NodeMaintenance
CR.
Setting a node to maintenance mode by using the web console
To set a node to maintenance mode, you can create a NodeMaintenance
custom resource (CR) by using the web console.
Prerequisites
Log in as a user with
cluster-admin
privileges.Install the Node Maintenance Operator from the OperatorHub.
Procedure
From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
Select the Node Maintenance Operator from the list of Operators.
In the Node Maintenance tab, click Create NodeMaintenance.
In the Create NodeMaintenance page, select the Form view or the YAML view to configure the
NodeMaintenance
CR.To apply the
NodeMaintenance
CR that you have configured, click Create.
Verification
In the Node Maintenance tab, inspect the Status
column and verify that its status is Succeeded
.
Setting a node to maintenance mode by using the CLI
You can put a node into maintenance mode with a NodeMaintenance
custom resource (CR). When you apply a NodeMaintenance
CR, all allowed pods are evicted and the node is rendered unschedulable. Evicted pods are queued to be moved to another node in the cluster.
Prerequisites
Install the OKD CLI
oc
.Log in to the cluster as a user with
cluster-admin
privileges.
Procedure
Create the following
NodeMaintenance
CR, and save the file asnodemaintenance-cr.yaml
:apiVersion: nodemaintenance.medik8s.io/v1beta1
kind: NodeMaintenance
metadata:
name: nodemaintenance-cr (1)
spec:
nodeName: node-1.example.com (2)
reason: "NIC replacement" (3)
1 The name of the node maintenance CR. 2 The name of the node to be put into maintenance mode. 3 A plain text description of the reason for maintenance. Apply the node maintenance CR by running the following command:
$ oc apply -f nodemaintenance-cr.yaml
Check the progress of the maintenance task by running the following command, replacing
<node-name>
with the name of your node; for example,node-1.example.com
:$ oc describe node node-1.example.com
Example output
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotSchedulable 61m kubelet Node node-1.example.com status is now: NodeNotSchedulable
Checking status of current NodeMaintenance CR tasks
You can check the status of current NodeMaintenance
CR tasks.
Prerequisites
Install the OKD CLI
oc
.Log in as a user with
cluster-admin
privileges.
Procedure
Check the status of current node maintenance tasks, for example the
NodeMaintenance
CR ornm
object, by running the following command:$ oc get nm -o yaml
Example output
apiVersion: v1
items:
- apiVersion: nodemaintenance.medik8s.io/v1beta1
kind: NodeMaintenance
metadata:
...
spec:
nodeName: node-1.example.com
reason: Node maintenance
status:
evictionPods: 3 (1)
lastError: "Last failure message" (2)
phase: Succeeded
totalpods: 5 (3)
...
1 The number of pods scheduled for eviction. 2 The latest eviction error, if any. 3 The total number of pods before the node entered maintenance mode.
Resuming a node from maintenance mode
You can resume a node from maintenance mode from the CLI or by using a NodeMaintenance
CR. Resuming a node brings it out of maintenance mode and makes it schedulable again.
Resuming a node from maintenance mode by using the web console
To resume a node from maintenance mode, you can delete a NodeMaintenance
custom resource (CR) by using the web console.
Prerequisites
Log in as a user with
cluster-admin
privileges.Install the Node Maintenance Operator from the OperatorHub.
Procedure
From the Administrator perspective in the web console, navigate to Operators → Installed Operators.
Select the Node Maintenance Operator from the list of Operators.
In the Node Maintenance tab, select the
NodeMaintenance
CR that you want to delete.Click the Options menu at the end of the node and select Delete NodeMaintenance.
Verification
In the OKD console, click Compute → Nodes.
Inspect the
Status
column of the node for which you deleted theNodeMaintenance
CR and verify that its status isReady
.
Resuming a node from maintenance mode by using the CLI
You can resume a node from maintenance mode that was initiated with a NodeMaintenance
CR by deleting the NodeMaintenance
CR.
Prerequisites
Install the OKD CLI
oc
.Log in to the cluster as a user with
cluster-admin
privileges.
Procedure
When your node maintenance task is complete, delete the active
NodeMaintenance
CR:$ oc delete -f nodemaintenance-cr.yaml
Example output
nodemaintenance.nodemaintenance.medik8s.io "maintenance-example" deleted