Expanding the cluster

After deploying an installer-provisioned OKD cluster, you can use the following procedures to expand the number of worker nodes. Ensure that each prospective worker node meets the prerequisites.

Expanding the cluster using RedFish Virtual Media involves meeting minimum firmware requirements. See Firmware requirements for installing with virtual media in the Prerequisites section for additional details when expanding the cluster using RedFish Virtual Media.

Preparing the bare metal node

To expand your cluster, you must provide the node with the relevant IP address. This can be done with a static configuration, or with a DHCP (Dynamic Host Configuration protocol) server. When expanding the cluster using a DHCP server, each node must have a DHCP reservation.

Reserving IP addresses so they become static IP addresses

Some administrators prefer to use static IP addresses so that each node’s IP address remains constant in the absence of a DHCP server. To configure static IP addresses with NMState, see “Optional: Configuring host network interfaces in the install-config.yaml file” in the “Setting up the environment for an OpenShift installation” section for additional details.

Preparing the bare metal node requires executing the following procedure from the provisioner node.

Procedure

  1. Get the oc binary:

    1. $ curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/$VERSION/openshift-client-linux-$VERSION.tar.gz | tar zxvf - oc
    1. $ sudo cp oc /usr/local/bin
  2. Power off the bare metal node by using the baseboard management controller (BMC), and ensure it is off.

  3. Retrieve the user name and password of the bare metal node’s baseboard management controller. Then, create base64 strings from the user name and password:

    1. $ echo -ne "root" | base64
    1. $ echo -ne "password" | base64
  4. Create a configuration file for the bare metal node. Depending on whether you are using a static configuration or a DHCP server, use one of the following example bmh.yaml files, replacing values in the YAML to match your environment:

    1. $ vim bmh.yaml
    • Static configuration bmh.yaml:

      1. ---
      2. apiVersion: v1 (1)
      3. kind: Secret
      4. metadata:
      5. name: openshift-worker-<num>-network-config-secret (2)
      6. namespace: openshift-machine-api
      7. type: Opaque
      8. stringData:
      9. nmstate: | (3)
      10. interfaces: (4)
      11. - name: <nic1_name> (5)
      12. type: ethernet
      13. state: up
      14. ipv4:
      15. address:
      16. - ip: <ip_address> (5)
      17. prefix-length: 24
      18. enabled: true
      19. dns-resolver:
      20. config:
      21. server:
      22. - <dns_ip_address> (5)
      23. routes:
      24. config:
      25. - destination: 0.0.0.0/0
      26. next-hop-address: <next_hop_ip_address> (5)
      27. next-hop-interface: <next_hop_nic1_name> (5)
      28. ---
      29. apiVersion: v1
      30. kind: Secret
      31. metadata:
      32. name: openshift-worker-<num>-bmc-secret (2)
      33. namespace: openshift-machine-api
      34. type: Opaque
      35. data:
      36. username: <base64_of_uid> (6)
      37. password: <base64_of_pwd> (6)
      38. ---
      39. apiVersion: metal3.io/v1alpha1
      40. kind: BareMetalHost
      41. metadata:
      42. name: openshift-worker-<num> (2)
      43. namespace: openshift-machine-api
      44. spec:
      45. online: True
      46. bootMACAddress: <nic1_mac_address> (7)
      47. bmc:
      48. address: <protocol>://<bmc_url> (8)
      49. credentialsName: openshift-worker-<num>-bmc-secret (2)
      50. disableCertificateVerification: True (9)
      51. username: <bmc_username> (10)
      52. password: <bmc_password> (10)
      53. rootDeviceHints:
      54. deviceName: <root_device_hint> (11)
      55. preprovisioningNetworkDataName: openshift-worker-<num>-network-config-secret (12)
      1To configure the network interface for a newly created node, specify the name of the secret that contains the network configuration. Follow the nmstate syntax to define the network configuration for your node. See “Optional: Configuring host network interfaces in the install-config.yaml file” for details on configuring NMState syntax.
      2Replace <num> for the worker number of the bare metal node in the name fields, the credentialsName field, and the preprovisioningNetworkDataName field.
      3Add the NMState YAML syntax to configure the host interfaces.
      4Optional: If you have configured the network interface with nmstate, and you want to disable an interface, set state: up with the IP addresses set to enabled: false as shown:
      1. —-
      2. interfaces:
      3. - name: <nic_name>
      4. type: ethernet
      5. state: up
      6. ipv4:
      7. enabled: false
      8. ipv6:
      9. enabled: false
      5Replace <nic1_name>, <ip_address>, <dns_ip_address>, <next_hop_ip_address> and <next_hop_nic1_name> with appropriate values.
      6Replace <base64_of_uid> and <base64_of_pwd> with the base64 string of the user name and password.
      7Replace <nic1_mac_address> with the MAC address of the bare metal node’s first NIC. See the “BMC addressing” section for additional BMC configuration options.
      8Replace <protocol> with the BMC protocol, such as IPMI, RedFish, or others. Replace <bmc_url> with the URL of the bare metal node’s baseboard management controller.
      9To skip certificate validation, set disableCertificateVerification to true.
      10Replace <bmc_username> and <bmc_password> with the string of the BMC user name and password.
      11Optional: Replace <root_device_hint> with a device path if you specify a root device hint.
      12Optional: If you have configured the network interface for the newly created node, provide the network configuration secret name in the preprovisioningNetworkDataName of the BareMetalHost CR.
    • DHCP configuration bmh.yaml:

      1. ---
      2. apiVersion: v1
      3. kind: Secret
      4. metadata:
      5. name: openshift-worker-<num>-bmc-secret (1)
      6. namespace: openshift-machine-api
      7. type: Opaque
      8. data:
      9. username: <base64_of_uid> (2)
      10. password: <base64_of_pwd> (2)
      11. ---
      12. apiVersion: metal3.io/v1alpha1
      13. kind: BareMetalHost
      14. metadata:
      15. name: openshift-worker-<num> (1)
      16. namespace: openshift-machine-api
      17. spec:
      18. online: True
      19. bootMACAddress: <nic1_mac_address> (3)
      20. bmc:
      21. address: <protocol>://<bmc_url> (4)
      22. credentialsName: openshift-worker-<num>-bmc-secret (1)
      23. disableCertificateVerification: True (5)
      24. username: <bmc_username> (6)
      25. password: <bmc_password> (6)
      26. rootDeviceHints:
      27. deviceName: <root_device_hint> (7)
      28. preprovisioningNetworkDataName: openshift-worker-<num>-network-config-secret (8)
      1Replace <num> for the worker number of the bare metal node in the name fields, the credentialsName field, and the preprovisioningNetworkDataName field.
      2Replace <base64_of_uid> and <base64_of_pwd> with the base64 string of the user name and password.
      3Replace <nic1_mac_address> with the MAC address of the bare metal node’s first NIC. See the “BMC addressing” section for additional BMC configuration options.
      4Replace <protocol> with the BMC protocol, such as IPMI, RedFish, or others. Replace <bmc_url> with the URL of the bare metal node’s baseboard management controller.
      5To skip certificate validation, set disableCertificateVerification to true.
      6Replace <bmc_username> and <bmc_password> with the string of the BMC user name and password.
      7Optional: Replace <root_device_hint> with a device path if you specify a root device hint.
      8Optional: If you have configured the network interface for the newly created node, provide the network configuration secret name in the preprovisioningNetworkDataName of the BareMetalHost CR.

    If the MAC address of an existing bare metal node matches the MAC address of a bare metal host that you are attempting to provision, then the Ironic installation will fail. If the host enrollment, inspection, cleaning, or other Ironic steps fail, the Bare Metal Operator retries the installation continuously. See “Diagnosing a host duplicate MAC address” for more information.

  5. Create the bare metal node:

    1. $ oc -n openshift-machine-api create -f bmh.yaml

    Example output

    1. secret/openshift-worker-<num>-network-config-secret created
    2. secret/openshift-worker-<num>-bmc-secret created
    3. baremetalhost.metal3.io/openshift-worker-<num> created

    Where <num> will be the worker number.

  6. Power up and inspect the bare metal node:

    1. $ oc -n openshift-machine-api get bmh openshift-worker-<num>

    Where <num> is the worker node number.

    Example output

    1. NAME STATE CONSUMER ONLINE ERROR
    2. openshift-worker-<num> available true

    To allow the worker node to join the cluster, scale the machineset object to the number of the BareMetalHost objects. You can scale nodes either manually or automatically. To scale nodes automatically, use the metal3.io/autoscale-to-hosts annotation for machineset.

Additional resources

Replacing a bare-metal control plane node

Use the following procedure to replace an installer-provisioned OKD control plane node.

If you reuse the BareMetalHost object definition from an existing control plane host, do not leave the externallyProvisioned field set to true.

Existing control plane BareMetalHost objects may have the externallyProvisioned flag set to true if they were provisioned by the OKD installation program.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

  • You have taken an etcd backup.

    Take an etcd backup before performing this procedure so that you can restore your cluster if you encounter any issues. For more information about taking an etcd backup, see the Additional resources section.

Procedure

  1. Ensure that the Bare Metal Operator is available:

    1. $ oc get clusteroperator baremetal

    Example output

    1. NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
    2. baremetal 4.12.0 True False False 3d15h
  2. Remove the old BareMetalHost and Machine objects:

    1. $ oc delete bmh -n openshift-machine-api <host_name>
    2. $ oc delete machine -n openshift-machine-api <machine_name>

    Replace <host_name> with the name of the host and <machine_name> with the name of the machine. The machine name appears under the CONSUMER field.

    After you remove the BareMetalHost and Machine objects, then the machine controller automatically deletes the Node object.

  3. Create the new BareMetalHost object and the secret to store the BMC credentials:

    1. $ cat <<EOF | oc apply -f -
    2. apiVersion: v1
    3. kind: Secret
    4. metadata:
    5. name: control-plane-<num>-bmc-secret (1)
    6. namespace: openshift-machine-api
    7. data:
    8. username: <base64_of_uid> (2)
    9. password: <base64_of_pwd> (3)
    10. type: Opaque
    11. ---
    12. apiVersion: metal3.io/v1alpha1
    13. kind: BareMetalHost
    14. metadata:
    15. name: control-plane-<num> (1)
    16. namespace: openshift-machine-api
    17. spec:
    18. automatedCleaningMode: disabled
    19. bmc:
    20. address: <protocol>://<bmc_ip> (4)
    21. credentialsName: control-plane-<num>-bmc-secret (1)
    22. bootMACAddress: <NIC1_mac_address> (5)
    23. bootMode: UEFI
    24. externallyProvisioned: false
    25. hardwareProfile: unknown
    26. online: true
    27. EOF
    1Replace <num> for the control plane number of the bare metal node in the name fields and the credentialsName field.
    2Replace <base64_of_uid> with the base64 string of the user name.
    3Replace <base64_of_pwd> with the base64 string of the password.
    4Replace <protocol> with the BMC protocol, such as redfish, redfish-virtualmedia, idrac-virtualmedia, or others. Replace <bmc_ip> with the IP address of the bare metal node’s baseboard management controller. For additional BMC configuration options, see “BMC addressing” in the Additional resources section.
    5Replace <NIC1_mac_address> with the MAC address of the bare metal node’s first NIC.

    After the inspection is complete, the BareMetalHost object is created and available to be provisioned.

  4. View available BareMetalHost objects:

    1. $ oc get bmh -n openshift-machine-api

    Example output

    1. NAME STATE CONSUMER ONLINE ERROR AGE
    2. control-plane-1.example.com available control-plane-1 true 1h10m
    3. control-plane-2.example.com externally provisioned control-plane-2 true 4h53m
    4. control-plane-3.example.com externally provisioned control-plane-3 true 4h53m
    5. compute-1.example.com provisioned compute-1-ktmmx true 4h53m
    6. compute-1.example.com provisioned compute-2-l2zmb true 4h53m

    There are no MachineSet objects for control plane nodes, so you must create a Machine object instead. You can copy the providerSpec from another control plane Machine object.

  5. Create a Machine object:

    1. $ cat <<EOF | oc apply -f -
    2. apiVersion: machine.openshift.io/v1beta1
    3. kind: Machine
    4. metadata:
    5. annotations:
    6. metal3.io/BareMetalHost: openshift-machine-api/control-plane-<num> (1)
    7. labels:
    8. machine.openshift.io/cluster-api-cluster: control-plane-<num> (1)
    9. machine.openshift.io/cluster-api-machine-role: master
    10. machine.openshift.io/cluster-api-machine-type: master
    11. name: control-plane-<num> (1)
    12. namespace: openshift-machine-api
    13. spec:
    14. metadata: {}
    15. providerSpec:
    16. value:
    17. apiVersion: baremetal.cluster.k8s.io/v1alpha1
    18. customDeploy:
    19. method: install_coreos
    20. hostSelector: {}
    21. image:
    22. checksum: ""
    23. url: ""
    24. kind: BareMetalMachineProviderSpec
    25. metadata:
    26. creationTimestamp: null
    27. userData:
    28. name: master-user-data-managed
    29. EOF
    1Replace <num> for the control plane number of the bare metal node in the name, labels and annotations fields.
  6. To view the BareMetalHost objects, run the following command:

    1. $ oc get bmh -A

    Example output

    1. NAME STATE CONSUMER ONLINE ERROR AGE
    2. control-plane-1.example.com provisioned control-plane-1 true 2h53m
    3. control-plane-2.example.com externally provisioned control-plane-2 true 5h53m
    4. control-plane-3.example.com externally provisioned control-plane-3 true 5h53m
    5. compute-1.example.com provisioned compute-1-ktmmx true 5h53m
    6. compute-2.example.com provisioned compute-2-l2zmb true 5h53m
  7. After the RHCOS installation, verify that the BareMetalHost is added to the cluster:

    1. $ oc get nodes

    Example output

    1. NAME STATUS ROLES AGE VERSION
    2. control-plane-1.example.com available master 4m2s v1.18.2
    3. control-plane-2.example.com available master 141m v1.18.2
    4. control-plane-3.example.com available master 141m v1.18.2
    5. compute-1.example.com available worker 87m v1.18.2
    6. compute-2.example.com available worker 87m v1.18.2

    After replacement of the new control plane node, the etcd pod running in the new node is in crashloopback status. See “Replacing an unhealthy etcd member” in the Additional resources section for more information.

Additional resources

Preparing to deploy with Virtual Media on the baremetal network

If the provisioning network is enabled and you want to expand the cluster using Virtual Media on the baremetal network, use the following procedure.

Prerequisites

  • There is an existing cluster with a baremetal network and a provisioning network.

Procedure

  1. Edit the provisioning custom resource (CR) to enable deploying with Virtual Media on the baremetal network:

    1. oc edit provisioning
    1. apiVersion: metal3.io/v1alpha1
    2. kind: Provisioning
    3. metadata:
    4. creationTimestamp: "2021-08-05T18:51:50Z"
    5. finalizers:
    6. - provisioning.metal3.io
    7. generation: 8
    8. name: provisioning-configuration
    9. resourceVersion: "551591"
    10. uid: f76e956f-24c6-4361-aa5b-feaf72c5b526
    11. spec:
    12. provisioningDHCPRange: 172.22.0.10,172.22.0.254
    13. provisioningIP: 172.22.0.3
    14. provisioningInterface: enp1s0
    15. provisioningNetwork: Managed
    16. provisioningNetworkCIDR: 172.22.0.0/24
    17. virtualMediaViaExternalNetwork: true (1)
    18. status:
    19. generations:
    20. - group: apps
    21. hash: ""
    22. lastGeneration: 7
    23. name: metal3
    24. namespace: openshift-machine-api
    25. resource: deployments
    26. - group: apps
    27. hash: ""
    28. lastGeneration: 1
    29. name: metal3-image-cache
    30. namespace: openshift-machine-api
    31. resource: daemonsets
    32. observedGeneration: 8
    33. readyReplicas: 0
    1Add virtualMediaViaExternalNetwork: true to the provisioning CR.
  2. If the image URL exists, edit the machineset to use the API VIP address. This step only applies to clusters installed in versions 4.9 or earlier.

    1. oc edit machineset
    1. apiVersion: machine.openshift.io/v1beta1
    2. kind: MachineSet
    3. metadata:
    4. creationTimestamp: "2021-08-05T18:51:52Z"
    5. generation: 11
    6. labels:
    7. machine.openshift.io/cluster-api-cluster: ostest-hwmdt
    8. machine.openshift.io/cluster-api-machine-role: worker
    9. machine.openshift.io/cluster-api-machine-type: worker
    10. name: ostest-hwmdt-worker-0
    11. namespace: openshift-machine-api
    12. resourceVersion: "551513"
    13. uid: fad1c6e0-b9da-4d4a-8d73-286f78788931
    14. spec:
    15. replicas: 2
    16. selector:
    17. matchLabels:
    18. machine.openshift.io/cluster-api-cluster: ostest-hwmdt
    19. machine.openshift.io/cluster-api-machineset: ostest-hwmdt-worker-0
    20. template:
    21. metadata:
    22. labels:
    23. machine.openshift.io/cluster-api-cluster: ostest-hwmdt
    24. machine.openshift.io/cluster-api-machine-role: worker
    25. machine.openshift.io/cluster-api-machine-type: worker
    26. machine.openshift.io/cluster-api-machineset: ostest-hwmdt-worker-0
    27. spec:
    28. metadata: {}
    29. providerSpec:
    30. value:
    31. apiVersion: baremetal.cluster.k8s.io/v1alpha1
    32. hostSelector: {}
    33. image:
    34. checksum: http:/172.22.0.3:6181/images/rhcos-<version>.<architecture>.qcow2.<md5sum> (1)
    35. url: http://172.22.0.3:6181/images/rhcos-<version>.<architecture>.qcow2 (2)
    36. kind: BareMetalMachineProviderSpec
    37. metadata:
    38. creationTimestamp: null
    39. userData:
    40. name: worker-user-data
    41. status:
    42. availableReplicas: 2
    43. fullyLabeledReplicas: 2
    44. observedGeneration: 11
    45. readyReplicas: 2
    46. replicas: 2
    1Edit the checksum URL to use the API VIP address.
    2Edit the url URL to use the API VIP address.

Diagnosing a duplicate MAC address when provisioning a new host in the cluster

If the MAC address of an existing bare-metal node in the cluster matches the MAC address of a bare-metal host you are attempting to add to the cluster, the Bare Metal Operator associates the host with the existing node. If the host enrollment, inspection, cleaning, or other Ironic steps fail, the Bare Metal Operator retries the installation continuously. A registration error is displayed for the failed bare-metal host.

You can diagnose a duplicate MAC address by examining the bare-metal hosts that are running in the openshift-machine-api namespace.

Prerequisites

  • Install an OKD cluster on bare metal.

  • Install the OKD CLI oc.

  • Log in as a user with cluster-admin privileges.

Procedure

To determine whether a bare-metal host that fails provisioning has the same MAC address as an existing node, do the following:

  1. Get the bare-metal hosts running in the openshift-machine-api namespace:

    1. $ oc get bmh -n openshift-machine-api

    Example output

    1. NAME STATUS PROVISIONING STATUS CONSUMER
    2. openshift-master-0 OK externally provisioned openshift-zpwpq-master-0
    3. openshift-master-1 OK externally provisioned openshift-zpwpq-master-1
    4. openshift-master-2 OK externally provisioned openshift-zpwpq-master-2
    5. openshift-worker-0 OK provisioned openshift-zpwpq-worker-0-lv84n
    6. openshift-worker-1 OK provisioned openshift-zpwpq-worker-0-zd8lm
    7. openshift-worker-2 error registering
  2. To see more detailed information about the status of the failing host, run the following command replacing <bare_metal_host_name> with the name of the host:

    1. $ oc get -n openshift-machine-api bmh <bare_metal_host_name> -o yaml

    Example output

    1. ...
    2. status:
    3. errorCount: 12
    4. errorMessage: MAC address b4:96:91:1d:7c:20 conflicts with existing node openshift-worker-1
    5. errorType: registration error
    6. ...

Provisioning the bare metal node

Provisioning the bare metal node requires executing the following procedure from the provisioner node.

Procedure

  1. Ensure the STATE is available before provisioning the bare metal node.

    1. $ oc -n openshift-machine-api get bmh openshift-worker-<num>

    Where <num> is the worker node number.

    1. NAME STATE ONLINE ERROR AGE
    2. openshift-worker available true 34h
  2. Get a count of the number of worker nodes.

    1. $ oc get nodes
    1. NAME STATUS ROLES AGE VERSION
    2. openshift-master-1.openshift.example.com Ready master 30h v1.26.0
    3. openshift-master-2.openshift.example.com Ready master 30h v1.26.0
    4. openshift-master-3.openshift.example.com Ready master 30h v1.26.0
    5. openshift-worker-0.openshift.example.com Ready worker 30h v1.26.0
    6. openshift-worker-1.openshift.example.com Ready worker 30h v1.26.0
  3. Get the compute machine set.

    1. $ oc get machinesets -n openshift-machine-api
    1. NAME DESIRED CURRENT READY AVAILABLE AGE
    2. ...
    3. openshift-worker-0.example.com 1 1 1 1 55m
    4. openshift-worker-1.example.com 1 1 1 1 55m
  4. Increase the number of worker nodes by one.

    1. $ oc scale --replicas=<num> machineset <machineset> -n openshift-machine-api

    Replace <num> with the new number of worker nodes. Replace <machineset> with the name of the compute machine set from the previous step.

  5. Check the status of the bare metal node.

    1. $ oc -n openshift-machine-api get bmh openshift-worker-<num>

    Where <num> is the worker node number. The STATE changes from ready to provisioning.

    1. NAME STATE CONSUMER ONLINE ERROR
    2. openshift-worker-<num> provisioning openshift-worker-<num>-65tjz true

    The provisioning status remains until the OKD cluster provisions the node. This can take 30 minutes or more. After the node is provisioned, the state will change to provisioned.

    1. NAME STATE CONSUMER ONLINE ERROR
    2. openshift-worker-<num> provisioned openshift-worker-<num>-65tjz true
  6. After provisioning completes, ensure the bare metal node is ready.

    1. $ oc get nodes
    1. NAME STATUS ROLES AGE VERSION
    2. openshift-master-1.openshift.example.com Ready master 30h v1.26.0
    3. openshift-master-2.openshift.example.com Ready master 30h v1.26.0
    4. openshift-master-3.openshift.example.com Ready master 30h v1.26.0
    5. openshift-worker-0.openshift.example.com Ready worker 30h v1.26.0
    6. openshift-worker-1.openshift.example.com Ready worker 30h v1.26.0
    7. openshift-worker-<num>.openshift.example.com Ready worker 3m27s v1.26.0

    You can also check the kubelet.

    1. $ ssh openshift-worker-<num>
    1. [kni@openshift-worker-<num>]$ journalctl -fu kubelet