Placing pods on specific nodes using node selectors

A node selector specifies a map of key/value pairs that are defined using custom labels on nodes and selectors specified in pods.

For the pod to be eligible to run on a node, the pod must have the same key/value node selector as the label on the node.

About node selectors

You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OKD schedules the pods on nodes that contain matching labels.

You can use a node selector to place specific pods on specific nodes, cluster-wide node selectors to place new pods on specific nodes anywhere in the cluster, and project node selectors to place new pods in a project on specific nodes.

For example, as a cluster administrator, you can create an infrastructure where application developers can deploy pods only onto the nodes closest to their geographical location by including a node selector in every pod they create. In this example, the cluster consists of five data centers spread across two regions. In the U.S., label the nodes as us-east, us-central, or us-west. In the Asia-Pacific region (APAC), label the nodes as apac-east or apac-west. The developers can add a node selector to the pods they create to ensure the pods get scheduled on those nodes.

A pod is not scheduled if the Pod object contains a node selector, but no node has a matching label.

If you are using node selectors and node affinity in the same pod configuration, the following rules control pod placement onto nodes:

  • If you configure both nodeSelector and nodeAffinity, both conditions must be satisfied for the pod to be scheduled onto a candidate node.

  • If you specify multiple nodeSelectorTerms associated with nodeAffinity types, then the pod can be scheduled onto a node if one of the nodeSelectorTerms is satisfied.

  • If you specify multiple matchExpressions associated with nodeSelectorTerms, then the pod can be scheduled onto a node only if all matchExpressions are satisfied.

Node selectors on specific pods and nodes

You can control which node a specific pod is scheduled on by using node selectors and labels.

To use node selectors and labels, first label the node to avoid pods being descheduled, then add the node selector to the pod.

You cannot add a node selector directly to an existing scheduled pod. You must label the object that controls the pod, such as deployment config.

For example, the following Node object has the region: east label:

Sample Node object with a label

  1. kind: Node
  2. apiVersion: v1
  3. metadata:
  4. name: ip-10-0-131-14.ec2.internal
  5. selfLink: /api/v1/nodes/ip-10-0-131-14.ec2.internal
  6. uid: 7bc2580a-8b8e-11e9-8e01-021ab4174c74
  7. resourceVersion: '478704'
  8. creationTimestamp: '2019-06-10T14:46:08Z'
  9. labels:
  10. kubernetes.io/os: linux
  11. failure-domain.beta.kubernetes.io/zone: us-east-1a
  12. node.openshift.io/os_version: '4.5'
  13. node-role.kubernetes.io/worker: ''
  14. failure-domain.beta.kubernetes.io/region: us-east-1
  15. node.openshift.io/os_id: fedora
  16. beta.kubernetes.io/instance-type: m4.large
  17. kubernetes.io/hostname: ip-10-0-131-14
  18. beta.kubernetes.io/arch: amd64
  19. region: east (1)
1Label to match the pod node selector.

A pod has the type: user-node,region: east node selector:

Sample Pod object with node selectors

  1. apiVersion: v1
  2. kind: Pod
  3. ....
  4. spec:
  5. nodeSelector: (1)
  6. region: east
  7. type: user-node
1Node selectors to match the node label.

When you create the pod using the example pod spec, it can be scheduled on the example node.

Default cluster-wide node selectors

With default cluster-wide node selectors, when you create a pod in that cluster, OKD adds the default node selectors to the pod and schedules the pod on nodes with matching labels.

For example, the following Scheduler object has the default cluster-wide region=east and type=user-node node selectors:

Example Scheduler Operator Custom Resource

  1. apiVersion: config.openshift.io/v1
  2. kind: Scheduler
  3. metadata:
  4. name: cluster
  5. ...
  6. spec:
  7. defaultNodeSelector: type=user-node,region=east
  8. ...

A node in that cluster has the type=user-node,region=east labels:

Example Node object

  1. apiVersion: v1
  2. kind: Node
  3. metadata:
  4. name: ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4
  5. ...
  6. labels:
  7. region: east
  8. type: user-node
  9. ...

Example Pod object with a node selector

  1. apiVersion: v1
  2. kind: Pod
  3. ...
  4. spec:
  5. nodeSelector:
  6. region: east
  7. ...

When you create the pod using the example pod spec in the example cluster, the pod is created with the cluster-wide node selector and is scheduled on the labeled node:

Example pod list with the pod on the labeled node

  1. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  2. pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>

If the project where you create the pod has a project node selector, that selector takes preference over a cluster-wide node selector. Your pod is not created or scheduled if the pod does not have the project node selector.

Project node selectors

With project node selectors, when you create a pod in this project, OKD adds the node selectors to the pod and schedules the pods on a node with matching labels. If there is a cluster-wide default node selector, a project node selector takes preference.

For example, the following project has the region=east node selector:

Example Namespace object

  1. apiVersion: v1
  2. kind: Namespace
  3. metadata:
  4. name: east-region
  5. annotations:
  6. openshift.io/node-selector: "region=east"
  7. ...

The following node has the type=user-node,region=east labels:

Example Node object

  1. apiVersion: v1
  2. kind: Node
  3. metadata:
  4. name: ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4
  5. ...
  6. labels:
  7. region: east
  8. type: user-node
  9. ...

When you create the pod using the example pod spec in this example project, the pod is created with the project node selectors and is scheduled on the labeled node:

Example Pod object

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. namespace: east-region
  5. ...
  6. spec:
  7. nodeSelector:
  8. region: east
  9. type: user-node
  10. ...

Example pod list with the pod on the labeled node

  1. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  2. pod-s1 1/1 Running 0 20s 10.131.2.6 ci-ln-qg1il3k-f76d1-hlmhl-worker-b-df2s4 <none> <none>

A pod in the project is not created or scheduled if the pod contains different node selectors. For example, if you deploy the following pod into the example project, it is not be created:

Example Pod object with an invalid node selector

  1. apiVersion: v1
  2. kind: Pod
  3. ...
  4. spec:
  5. nodeSelector:
  6. region: west
  7. ....

Using node selectors to control pod placement

You can use node selectors on pods and labels on nodes to control where the pod is scheduled. With node selectors, OKD schedules the pods on nodes that contain matching labels.

You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.

To add node selectors to an existing pod, add a node selector to the controlling object for that pod, such as a ReplicaSet object, DaemonSet object, StatefulSet object, Deployment object, or DeploymentConfig object. Any existing pods under that controlling object are recreated on a node with a matching label. If you are creating a new pod, you can add the node selector directly to the Pod spec.

You cannot add a node selector directly to an existing scheduled pod.

Prerequisites

To add a node selector to existing pods, determine the controlling object for that pod. For example, the router-default-66d5cf9464-m2g75 pod is controlled by the router-default-66d5cf9464 replica set:

  1. $ oc describe pod router-default-66d5cf9464-7pwkc
  2. Name: router-default-66d5cf9464-7pwkc
  3. Namespace: openshift-ingress
  4. ....
  5. Controlled By: ReplicaSet/router-default-66d5cf9464

The web console lists the controlling object under ownerReferences in the pod YAML:

  1. ownerReferences:
  2. - apiVersion: apps/v1
  3. kind: ReplicaSet
  4. name: router-default-66d5cf9464
  5. uid: d81dd094-da26-11e9-a48a-128e7edf0312
  6. controller: true
  7. blockOwnerDeletion: true

Procedure

  1. Add labels to a node by using a machine set or editing the node directly:

    • Use a MachineSet object to add labels to nodes managed by the machine set when a node is created:

      1. Run the following command to add labels to a MachineSet object:

        1. $ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api

        For example:

        1. $ oc patch MachineSet abc612-msrtw-worker-us-east-1c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api
      2. Verify that the labels are added to the MachineSet object by using the oc edit command:

        For example:

        1. $ oc edit MachineSet abc612-msrtw-worker-us-east-1c -n openshift-machine-api

        Example MachineSet object

        1. apiVersion: machine.openshift.io/v1beta1
        2. kind: MachineSet
        3. ....
        4. spec:
        5. ...
        6. template:
        7. metadata:
        8. ...
        9. spec:
        10. metadata:
        11. labels:
        12. region: east
        13. type: user-node
        14. ....
    • Add labels directly to a node:

      1. Edit the Node object for the node:

        1. $ oc label nodes <name> <key>=<value>

        For example, to label a node:

        1. $ oc label nodes ip-10-0-142-25.ec2.internal type=user-node region=east
      2. Verify that the labels are added to the node:

        1. $ oc get nodes -l type=user-node,region=east

        Example output

        1. NAME STATUS ROLES AGE VERSION
        2. ip-10-0-142-25.ec2.internal Ready worker 17m v1.22.1
  1. Add the matching node selector to a pod:

    • To add a node selector to existing and future pods, add a node selector to the controlling object for the pods:

      Example ReplicaSet object with labels

      1. kind: ReplicaSet
      2. ....
      3. spec:
      4. ....
      5. template:
      6. metadata:
      7. creationTimestamp: null
      8. labels:
      9. ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
      10. pod-template-hash: 66d5cf9464
      11. spec:
      12. nodeSelector:
      13. kubernetes.io/os: linux
      14. node-role.kubernetes.io/worker: ''
      15. type: user-node (1)
      1Add the node selector.
    • To add a node selector to a specific, new pod, add the selector to the Pod object directly:

      Example Pod object with a node selector

      1. apiVersion: v1
      2. kind: Pod
      3. ....
      4. spec:
      5. nodeSelector:
      6. region: east
      7. type: user-node

      You cannot add a node selector directly to an existing scheduled pod.

Creating default cluster-wide node selectors

You can use default cluster-wide node selectors on pods together with labels on nodes to constrain all pods created in a cluster to specific nodes.

With cluster-wide node selectors, when you create a pod in that cluster, OKD adds the default node selectors to the pod and schedules the pod on nodes with matching labels.

You configure cluster-wide node selectors by editing the Scheduler Operator custom resource (CR). You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.

You can add additional key/value pairs to a pod. But you cannot add a different value for a default key.

Procedure

To add a default cluster-wide node selector:

  1. Edit the Scheduler Operator CR to add the default cluster-wide node selectors:

    1. $ oc edit scheduler cluster

    Example Scheduler Operator CR with a node selector

    1. apiVersion: config.openshift.io/v1
    2. kind: Scheduler
    3. metadata:
    4. name: cluster
    5. ...
    6. spec:
    7. defaultNodeSelector: type=user-node,region=east (1)
    8. mastersSchedulable: false
    9. policy:
    10. name: ""
    1Add a node selector with the appropriate <key>:<value> pairs.

    After making this change, wait for the pods in the openshift-kube-apiserver project to redeploy. This can take several minutes. The default cluster-wide node selector does not take effect until the pods redeploy.

  2. Add labels to a node by using a machine set or editing the node directly:

    • Use a machine set to add labels to nodes managed by the machine set when a node is created:

      1. Run the following command to add labels to a MachineSet object:

        1. $ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api (1)
        1Add a <key>/<value> pair for each label.

        For example:

        1. $ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api
      2. Verify that the labels are added to the MachineSet object by using the oc edit command:

        For example:

        1. $ oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api

        Example output

        1. apiVersion: machine.openshift.io/v1beta1
        2. kind: MachineSet
        3. metadata:
        4. ...
        5. spec:
        6. ...
        7. template:
        8. metadata:
        9. ...
        10. spec:
        11. metadata:
        12. labels:
        13. region: east
        14. type: user-node
      3. Redeploy the nodes associated with that machine set by scaling down to 0 and scaling up the nodes:

        For example:

        1. $ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
        1. $ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
      4. When the nodes are ready and available, verify that the label is added to the nodes by using the oc get command:

        1. $ oc get nodes -l <key>=<value>

        For example:

        1. $ oc get nodes -l type=user-node

        Example output

        1. NAME STATUS ROLES AGE VERSION
        2. ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.22.1
    • Add labels directly to a node:

      1. Edit the Node object for the node:

        1. $ oc label nodes <name> <key>=<value>

        For example, to label a node:

        1. $ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 type=user-node region=east
      2. Verify that the labels are added to the node using the oc get command:

        1. $ oc get nodes -l <key>=<value>,<key>=<value>

        For example:

        1. $ oc get nodes -l type=user-node,region=east

        Example output

        1. NAME STATUS ROLES AGE VERSION
        2. ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.22.1

Creating project-wide node selectors

You can use node selectors in a project together with labels on nodes to constrain all pods created in that project to the labeled nodes.

When you create a pod in this project, OKD adds the node selectors to the pods in the project and schedules the pods on a node with matching labels in the project. If there is a cluster-wide default node selector, a project node selector takes preference.

You add node selectors to a project by editing the Namespace object to add the openshift.io/node-selector parameter. You add labels to a node, a machine set, or a machine config. Adding the label to the machine set ensures that if the node or machine goes down, new nodes have the label. Labels added to a node or machine config do not persist if the node or machine goes down.

A pod is not scheduled if the Pod object contains a node selector, but no project has a matching node selector. When you create a pod from that spec, you receive an error similar to the following message:

Example error message

  1. Error from server (Forbidden): error when creating "pod.yaml": pods "pod-4" is forbidden: pod node label selector conflicts with its project node label selector

You can add additional key/value pairs to a pod. But you cannot add a different value for a project key.

Procedure

To add a default project node selector:

  1. Create a project or edit an existing project to add the openshift.io/node-selector parameter:

    1. $ oc edit project <name>

    Example output

    1. apiVersion: project.openshift.io/v1
    2. kind: Project
    3. metadata:
    4. annotations:
    5. openshift.io/node-selector: "type=user-node,region=east" (1)
    6. openshift.io/description: ""
    7. openshift.io/display-name: ""
    8. openshift.io/requester: kube:admin
    9. openshift.io/sa.scc.mcs: s0:c30,c5
    10. openshift.io/sa.scc.supplemental-groups: 1000880000/10000
    11. openshift.io/sa.scc.uid-range: 1000880000/10000
    12. creationTimestamp: "2021-05-10T12:35:04Z"
    13. labels:
    14. kubernetes.io/metadata.name: demo
    15. name: demo
    16. resourceVersion: "145537"
    17. uid: 3f8786e3-1fcb-42e3-a0e3-e2ac54d15001
    18. spec:
    19. finalizers:
    20. - kubernetes
    1Add the openshift.io/node-selector with the appropriate <key>:<value> pairs.
  2. Add labels to a node by using a machine set or editing the node directly:

    • Use a MachineSet object to add labels to nodes managed by the machine set when a node is created:

      1. Run the following command to add labels to a MachineSet object:

        1. $ oc patch MachineSet <name> --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"<key>"="<value>","<key>"="<value>"}}]' -n openshift-machine-api

        For example:

        1. $ oc patch MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c --type='json' -p='[{"op":"add","path":"/spec/template/spec/metadata/labels", "value":{"type":"user-node","region":"east"}}]' -n openshift-machine-api
      2. Verify that the labels are added to the MachineSet object by using the oc edit command:

        For example:

        1. $ oc edit MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api

        Example output

        1. apiVersion: machine.openshift.io/v1beta1
        2. kind: MachineSet
        3. metadata:
        4. ...
        5. spec:
        6. ...
        7. template:
        8. metadata:
        9. ...
        10. spec:
        11. metadata:
        12. labels:
        13. region: east
        14. type: user-node
      3. Redeploy the nodes associated with that machine set:

        For example:

        1. $ oc scale --replicas=0 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
        1. $ oc scale --replicas=1 MachineSet ci-ln-l8nry52-f76d1-hl7m7-worker-c -n openshift-machine-api
      4. When the nodes are ready and available, verify that the label is added to the nodes by using the oc get command:

        1. $ oc get nodes -l <key>=<value>

        For example:

        1. $ oc get nodes -l type=user-node,region=east

        Example output

        1. NAME STATUS ROLES AGE VERSION
        2. ci-ln-l8nry52-f76d1-hl7m7-worker-c-vmqzp Ready worker 61s v1.22.1
    • Add labels directly to a node:

      1. Edit the Node object to add labels:

        1. $ oc label <resource> <name> <key>=<value>

        For example, to label a node:

        1. $ oc label nodes ci-ln-l8nry52-f76d1-hl7m7-worker-c-tgq49 type=user-node region=east
      2. Verify that the labels are added to the Node object using the oc get command:

        1. $ oc get nodes -l <key>=<value>

        For example:

        1. $ oc get nodes -l type=user-node,region=east

        Example output

        1. NAME STATUS ROLES AGE VERSION
        2. ci-ln-l8nry52-f76d1-hl7m7-worker-b-tgq49 Ready worker 17m v1.22.1

Additional resources