Rolling back to the OpenShift SDN network provider

As a cluster administrator, you can rollback to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin if the migration to OVN-Kubernetes is unsuccessful.

Migrating to the OpenShift SDN network plugin

As a cluster administrator, you can migrate to the OpenShift SDN Container Network Interface (CNI) network plugin. During the migration you must reboot every node in your cluster.

Only rollback to OpenShift SDN if the migration to OVN-Kubernetes fails.

Prerequisites

  • Install the OpenShift CLI (oc).

  • Access to the cluster as a user with the cluster-admin role.

  • A cluster installed on infrastructure configured with the OVN-Kubernetes network plugin.

  • A recent backup of the etcd database is available.

  • A reboot can be triggered manually for each node.

  • The cluster is in a known good state, without any errors.

Procedure

  1. Stop all of the machine configuration pools managed by the Machine Config Operator (MCO):

    • Stop the master configuration pool:

      1. $ oc patch MachineConfigPool master --type='merge' --patch \
      2. '{ "spec": { "paused": true } }'
    • Stop the worker machine configuration pool:

      1. $ oc patch MachineConfigPool worker --type='merge' --patch \
      2. '{ "spec":{ "paused": true } }'
  2. To prepare for the migration, set the migration field to null by entering the following command:

    1. $ oc patch Network.operator.openshift.io cluster --type='merge' \
    2. --patch '{ "spec": { "migration": null } }'
  3. To start the migration, set the network plugin back to OpenShift SDN by entering the following commands:

    1. $ oc patch Network.operator.openshift.io cluster --type='merge' \
    2. --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
    3. $ oc patch Network.config.openshift.io cluster --type='merge' \
    4. --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'
  4. Optional: You can disable automatic migration of several OVN-Kubernetes capabilities to the OpenShift SDN equivalents:

    • Egress IPs

    • Egress firewall

    • Multicast

    To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:

    1. $ oc patch Network.operator.openshift.io cluster --type='merge' \
    2. --patch '{
    3. "spec": {
    4. "migration": {
    5. "networkType": "OpenShiftSDN",
    6. "features": {
    7. "egressIP": <bool>,
    8. "egressFirewall": <bool>,
    9. "multicast": <bool>
    10. }
    11. }
    12. }
    13. }'

    where:

    bool: Specifies whether to enable migration of the feature. The default is true.

  5. Optional: You can customize the following settings for OpenShift SDN to meet your network infrastructure requirements:

    • Maximum transmission unit (MTU)

    • VXLAN port

    To customize either or both of the previously noted settings, customize and enter the following command. If you do not need to change the default value, omit the key from the patch.

    1. $ oc patch Network.operator.openshift.io cluster --type=merge \
    2. --patch '{
    3. "spec":{
    4. "defaultNetwork":{
    5. "openshiftSDNConfig":{
    6. "mtu":<mtu>,
    7. "vxlanPort":<port>
    8. }}}}'

    mtu

    The MTU for the VXLAN overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 50 less than the smallest node MTU value.

    port

    The UDP port for the VXLAN overlay network. If a value is not specified, the default is 4789. The port cannot be the same as the Geneve port that is used by OVN-Kubernetes. The default value for the Geneve port is 6081.

    Example patch command

    1. $ oc patch Network.operator.openshift.io cluster --type=merge \
    2. --patch '{
    3. "spec":{
    4. "defaultNetwork":{
    5. "openshiftSDNConfig":{
    6. "mtu":1200
    7. }}}}'
  6. Wait until the Multus daemon set rollout completes.

    1. $ oc -n openshift-multus rollout status daemonset/multus

    The name of the Multus pods is in form of multus-<xxxxx> where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

    Example output

    1. Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
    2. ...
    3. Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
    4. daemon set "multus" successfully rolled out
  7. To complete changing the network plugin, reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:

    • With the oc rsh command, you can use a bash script similar to the following:

      1. #!/bin/bash
      2. readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"
      3. for i in "${POD_NODES[@]}"
      4. do
      5. read -r POD NODE <<< "$i"
      6. until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
      7. do
      8. echo "cannot reboot node $NODE, retry" && sleep 3
      9. done
      10. done
    • With the ssh command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.

      1. #!/bin/bash
      2. for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
      3. do
      4. echo "reboot node $ip"
      5. ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
      6. done
  8. After the nodes in your cluster have rebooted, start all of the machine configuration pools:

    • Start the master configuration pool:

      1. $ oc patch MachineConfigPool master --type='merge' --patch \
      2. '{ "spec": { "paused": false } }'
    • Start the worker configuration pool:

      1. $ oc patch MachineConfigPool worker --type='merge' --patch \
      2. '{ "spec": { "paused": false } }'

    As the MCO updates machines in each config pool, it reboots each node.

    By default the MCO updates a single machine per pool at a time, so the time that the migration requires to complete grows with the size of the cluster.

  9. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      1. $ oc describe node | egrep "hostname|machineconfig"

      Example output

      1. kubernetes.io/hostname=master-0
      2. machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      3. machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      4. machineconfiguration.openshift.io/reason:
      5. machineconfiguration.openshift.io/state: Done

      Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.

      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

    2. To confirm that the machine config is correct, enter the following command:

      1. $ oc get machineconfig <config_name> -o yaml

      where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

  10. Confirm that the migration succeeded:

    1. To confirm that the network plugin is OpenShift SDN, enter the following command. The value of status.networkType must be OpenShiftSDN.

      1. $ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
    2. To confirm that the cluster nodes are in the Ready state, enter the following command:

      1. $ oc get nodes
    3. If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors.

      1. To list the pods, enter the following command:

        1. $ oc get pod -n openshift-machine-config-operator

        Example output

        1. NAME READY STATUS RESTARTS AGE
        2. machine-config-controller-75f756f89d-sjp8b 1/1 Running 0 37m
        3. machine-config-daemon-5cf4b 2/2 Running 0 43h
        4. machine-config-daemon-7wzcd 2/2 Running 0 43h
        5. machine-config-daemon-fc946 2/2 Running 0 43h
        6. machine-config-daemon-g2v28 2/2 Running 0 43h
        7. machine-config-daemon-gcl4f 2/2 Running 0 43h
        8. machine-config-daemon-l5tnv 2/2 Running 0 43h
        9. machine-config-operator-79d9c55d5-hth92 1/1 Running 0 37m
        10. machine-config-server-bsc8h 1/1 Running 0 43h
        11. machine-config-server-hklrm 1/1 Running 0 43h
        12. machine-config-server-k9rtx 1/1 Running 0 43h

        The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

      2. To display the pod log for each machine config daemon pod shown in the previous output, enter the following command:

        1. $ oc logs <pod> -n openshift-machine-config-operator

        where pod is the name of a machine config daemon pod.

      3. Resolve any errors in the logs shown by the output from the previous command.

    4. To confirm that your pods are not in an error state, enter the following command:

      1. $ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'

      If pods on a node are in an error state, reboot that node.

  11. Complete the following steps only if the migration succeeds and your cluster is in a good state:

    1. To remove the migration configuration from the Cluster Network Operator configuration object, enter the following command:

      1. $ oc patch Network.operator.openshift.io cluster --type='merge' \
      2. --patch '{ "spec": { "migration": null } }'
    2. To remove the OVN-Kubernetes configuration, enter the following command:

      1. $ oc patch Network.operator.openshift.io cluster --type='merge' \
      2. --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'
    3. To remove the OVN-Kubernetes network provider namespace, enter the following command:

      1. $ oc delete namespace openshift-ovn-kubernetes