Managing nodes

Overview

You can manage nodes in your instance using the CLI.

When you perform node management operations, the CLI interacts with node objects that are representations of actual node hosts. The master uses the information from node objects to validate nodes with health checks.

Listing nodes

To list all nodes that are known to the master:

  1. $ oc get nodes
  2. NAME STATUS ROLES AGE VERSION
  3. master.example.com Ready master 7h v1.9.1+a0ce1bc657
  4. node1.example.com Ready compute 7h v1.9.1+a0ce1bc657
  5. node2.example.com Ready compute 7h v1.9.1+a0ce1bc657

To list all nodes with information on a project’s pod deployment with node information

  1. $ oc get nodes -o wide
  2. NAME STATUS ROLES AGE VERSION EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
  3. ip-172-18-0-39.ec2.internal Ready infra 1d v1.10.0+b81c8f8 54.172.185.130 Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-862.el7.x86_64 docker://1.13.1
  4. ip-172-18-10-95.ec2.internal Ready master 1d v1.10.0+b81c8f8 54.88.22.81 Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-862.el7.x86_64 docker://1.13.1
  5. ip-172-18-8-35.ec2.internal Ready compute 1d v1.10.0+b81c8f8 34.230.50.57 Red Hat Enterprise Linux Server 7.5 (Maipo) 3.10.0-862.el7.x86_64 docker://1.13.1

To list only information about a single node, replace <node> with the full node name:

  1. $ oc get node <node>

The STATUS column in the output of these commands can show nodes with the following conditions:

Table 1. Node Conditions
ConditionDescription

Ready

The node is passing the health checks performed from the master by returning StatusOK.

NotReady

The node is not passing the health checks performed from the master.

SchedulingDisabled

Pods cannot be scheduled for placement on the node.

The STATUS column can also show Unknown for a node if the CLI cannot find any node condition.

To get more detailed information about a specific node, including the reason for the current condition:

  1. $ oc describe node <node>

For example:

  1. $ oc describe node node1.example.com
  2. Name: node1.example.com (1)
  3. Roles: compute (2)
  4. Labels: beta.kubernetes.io/arch=amd64 (3)
  5. beta.kubernetes.io/os=linux
  6. kubernetes.io/hostname=m01.example.com
  7. node-role.kubernetes.io/compute=true
  8. node-role.kubernetes.io/infra=true
  9. node-role.kubernetes.io/master=true
  10. zone=default
  11. Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true (4)
  12. CreationTimestamp: Thu, 24 May 2018 11:46:56 -0400
  13. Taints: <none> (5)
  14. Unschedulable: false
  15. Conditions: (6)
  16. Type Status LastHeartbeatTime LastTransitionTime Reason Message
  17. ---- ------ ----------------- ------------------ ------ -------
  18. OutOfDisk False Tue, 17 Jul 2018 11:47:30 -0400 Tue, 10 Jul 2018 15:45:16 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available
  19. MemoryPressure False Tue, 17 Jul 2018 11:47:30 -0400 Tue, 10 Jul 2018 15:45:16 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available
  20. DiskPressure False Tue, 17 Jul 2018 11:47:30 -0400 Tue, 10 Jul 2018 16:03:54 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure
  21. Ready True Tue, 17 Jul 2018 11:47:30 -0400 Mon, 16 Jul 2018 15:10:25 -0400 KubeletReady kubelet is posting ready status
  22. PIDPressure False Tue, 17 Jul 2018 11:47:30 -0400 Thu, 05 Jul 2018 10:06:51 -0400 KubeletHasSufficientPID kubelet has sufficient PID available
  23. Addresses: (7)
  24. InternalIP: 192.168.122.248
  25. Hostname: node1.example.com
  26. Capacity: (8)
  27. cpu: 2
  28. hugepages-2Mi: 0
  29. memory: 8010336Ki
  30. pods: 40
  31. Allocatable:
  32. cpu: 2
  33. hugepages-2Mi: 0
  34. memory: 7907936Ki
  35. pods: 40
  36. System Info: (9)
  37. Machine ID: b3adb9acbc49fc1f9a7d6
  38. System UUID: B3ADB9A-B0CB-C49FC1F9A7D6
  39. Boot ID: 9359d15aec9-81a20aef5876
  40. Kernel Version: 3.10.0-693.21.1.el7.x86_64
  41. OS Image: OpenShift Enterprise
  42. Operating System: linux
  43. Architecture: amd64
  44. Container Runtime Version: docker://1.13.1
  45. Kubelet Version: v1.10.0+b81c8f8
  46. Kube-Proxy Version: v1.10.0+b81c8f8
  47. ExternalID: node1.example.com
  48. Non-terminated Pods: (14 in total) (10)
  49. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
  50. --------- ---- ------------ ---------- --------------- -------------
  51. default docker-registry-2-w252l 100m (5%) 0 (0%) 256Mi (3%) 0 (0%)
  52. default registry-console-2-dpnc9 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  53. default router-2-5snb2 100m (5%) 0 (0%) 256Mi (3%) 0 (0%)
  54. kube-service-catalog apiserver-jh6gt 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  55. kube-service-catalog controller-manager-z4t5j 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  56. kube-system master-api-m01.example.com 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  57. kube-system master-controllers-m01.example.com 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  58. kube-system master-etcd-m01.example.com 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  59. openshift-ansible-service-broker asb-1-hnn5t 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  60. openshift-node sync-dvhvs 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  61. openshift-sdn ovs-zjs5k 100m (5%) 200m (10%) 300Mi (3%) 400Mi (5%)
  62. openshift-sdn sdn-zr4cb 100m (5%) 0 (0%) 200Mi (2%) 0 (0%)
  63. openshift-template-service-broker apiserver-s9n7t 0 (0%) 0 (0%) 0 (0%) 0 (0%)
  64. openshift-web-console webconsole-785689b664-q7s9j 100m (5%) 0 (0%) 100Mi (1%) 0 (0%)
  65. Allocated resources:
  66. (Total limits may be over 100 percent, i.e., overcommitted.)
  67. CPU Requests CPU Limits Memory Requests Memory Limits
  68. ------------ ---------- --------------- -------------
  69. 500m (25%) 200m (10%) 1112Mi (14%) 400Mi (5%)
  70. Events: (11)
  71. Type Reason Age From Message
  72. ---- ------ ---- ---- -------
  73. Normal NodeHasSufficientPID 6d (x5 over 6d) kubelet, m01.example.com Node m01.example.com status is now: NodeHasSufficientPID
  74. Normal NodeAllocatableEnforced 6d kubelet, m01.example.com Updated Node Allocatable limit across pods
  75. Normal NodeHasSufficientMemory 6d (x6 over 6d) kubelet, m01.example.com Node m01.example.com status is now: NodeHasSufficientMemory
  76. Normal NodeHasNoDiskPressure 6d (x6 over 6d) kubelet, m01.example.com Node m01.example.com status is now: NodeHasNoDiskPressure
  77. Normal NodeHasSufficientDisk 6d (x6 over 6d) kubelet, m01.example.com Node m01.example.com status is now: NodeHasSufficientDisk
  78. Normal NodeHasSufficientPID 6d kubelet, m01.example.com Node m01.example.com status is now: NodeHasSufficientPID
  79. Normal Starting 6d kubelet, m01.example.com Starting kubelet.
  80. ...
1The name of the node.
2The role of the node, either master, compute, or infra.
3The labels applied to the node.
4The annotations applied to the node.
5The taints applied to the node.
6Node conditions.
7The IP address and host name of the node.
8The pod resources and allocatable resources.
9Information about the node host.
10The pods on the node.
11The events reported by the node.

Viewing nodes

You can display usage statistics about nodes, which provide the runtime environments for containers. These usage statistics include CPU, memory, and storage consumption.

To view the usage statistics:

  1. $ oc adm top nodes
  2. NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
  3. node-1 297m 29% 4263Mi 55%
  4. node-0 55m 5% 1201Mi 15%
  5. infra-1 85m 8% 1319Mi 17%
  6. infra-0 182m 18% 2524Mi 32%
  7. master-0 178m 8% 2584Mi 16%

To view the usage statistics for nodes with labels:

  1. $ oc adm top node --selector=''

You must choose the selector (label query) to filter on. Supports =, ==, and !=.

You must have cluster-reader permission to view the usage statistics.

The metrics-server must be installed to view the usage statistics. See Requirements for Using Horizontal Pod Autoscalers.

Adding hosts

You can add new hosts to your cluster by running the scaleup.yml playbook. This playbook queries the master, generates and distributes new certificates for the new hosts, and then runs the configuration playbooks on only the new hosts. Before running the scaleup.yml playbook, complete all prerequisite host preparation steps.

The scaleup.yml playbook configures only the new host. It does not update NO_PROXY in master services, and it does not restart master services.

You must have an existing inventory file, for example /etc/ansible/hosts, that is representative of your current cluster configuration in order to run the scaleup.yml playbook.

See the cluster maximums section for the recommended maximum number of nodes.

Procedure

  1. Ensure you have the latest playbooks by updating the openshift-ansible package:

    1. # yum update openshift-ansible
  2. Edit your /etc/ansible/hosts file and add new_ to the [OSEv3:children] section. For example, to add a new node host, add new_nodes:

    1. [OSEv3:children]
    2. masters
    3. nodes
    4. new_nodes

    To add new master hosts, add new_masters.

  3. Create a [new_] section to specify host information for the new hosts. Format this section like an existing section, as shown in the following example of adding a new node:

    1. [nodes]
    2. master[1:3].example.com
    3. node1.example.com openshift_node_group_name='node-config-compute'
    4. node2.example.com openshift_node_group_name='node-config-compute'
    5. infra-node1.example.com openshift_node_group_name='node-config-infra'
    6. infra-node2.example.com openshift_node_group_name='node-config-infra'
    7. [new_nodes]
    8. node3.example.com openshift_node_group_name='node-config-infra'

    See Configuring Host Variables for more options.

    When adding new masters, add hosts to both the [new_masters] section and the [new_nodes] section to ensure that the new master host is part of the OpenShift SDN:

    1. [masters]
    2. master[1:2].example.com
    3. [new_masters]
    4. master3.example.com
    5. [nodes]
    6. master[1:2].example.com
    7. node1.example.com openshift_node_group_name='node-config-compute'
    8. node2.example.com openshift_node_group_name='node-config-compute'
    9. infra-node1.example.com openshift_node_group_name='node-config-infra'
    10. infra-node2.example.com openshift_node_group_name='node-config-infra'
    11. [new_nodes]
    12. master3.example.com

    If you label a master host with the node-role.kubernetes.io/infra=true label and have no other dedicated infrastructure nodes, you must also explicitly mark the host as schedulable by adding openshift_schedulable=true to the entry. Otherwise, the registry and router pods cannot be placed anywhere.

  4. Change to the playbook directory and run the openshift_node_group.yml playbook. If your inventory file is located somewhere other than the default of /etc/ansible/hosts, specify the location with the -i option:

    1. $ cd /usr/share/ansible/openshift-ansible
    2. $ ansible-playbook [-i /path/to/file] \
    3. playbooks/openshift-master/openshift_node_group.yml

    This creates the ConfigMap for the new node groups, and ultimately, the configuration file of the node on the host.

    Running the openshift_node_group.yaml playbook only updates new nodes. It cannot be run to update existing nodes in a cluster.

  5. Run the scaleup.yml playbook. If your inventory file is located somewhere other than the default of /etc/ansible/hosts, specify the location with the -i option.

    • For additional nodes:

      1. $ ansible-playbook [-i /path/to/file] \
      2. playbooks/openshift-node/scaleup.yml
    • For additional masters:

      1. $ ansible-playbook [-i /path/to/file] \
      2. playbooks/openshift-master/scaleup.yml
  1. Set the node label to logging-infra-fluentd=true, if you deployed the EFK stack in your cluster:

    1. # oc label node/new-node.example.com logging-infra-fluentd=true
  2. After the playbook runs, verify the installation.

  3. Move any hosts that you defined in the [new_] section to their appropriate section. By moving these hosts, subsequent playbook runs that use this inventory file treat the nodes correctly. You can keep the empty [new_] section. For example, when adding new nodes:

    1. [nodes]
    2. master[1:3].example.com
    3. node1.example.com openshift_node_group_name='node-config-compute'
    4. node2.example.com openshift_node_group_name='node-config-compute'
    5. node3.example.com openshift_node_group_name='node-config-compute'
    6. infra-node1.example.com openshift_node_group_name='node-config-infra'
    7. infra-node2.example.com openshift_node_group_name='node-config-infra'
    8. [new_nodes]

Deleting nodes

When you delete a node using the CLI, the node object is deleted in Kubernetes, but the pods that exist on the node itself are not deleted. Any bare pods not backed by a replication controller would be inaccessible to OKD, pods backed by replication controllers would be rescheduled to other available nodes, and local manifest pods would need to be manually deleted.

To delete a node from the OKD cluster:

  1. Evacuate pods from the node you are preparing to delete.

  2. Delete the node object:

    1. $ oc delete node <node>
  3. Check that the node has been removed from the node list:

    1. $ oc get nodes

    Pods should now be only scheduled for the remaining nodes that are in Ready state.

  4. If you want to uninstall all OKD content from the node host, including all pods and containers, continue to Uninstalling Nodes and follow the procedure using the uninstall.yml playbook. The procedure assumes general understanding of the cluster installation process using Ansible.

Updating labels on nodes

To add or update labels on a node:

  1. $ oc label node <node> <key_1>=<value_1> ... <key_n>=<value_n>

To see more detailed usage:

  1. $ oc label -h

Listing pods on nodes

To list all or selected pods on one or more nodes:

  1. $ oc adm manage-node <node1> <node2> \
  2. --list-pods [--pod-selector=<pod_selector>] [-o json|yaml]

To list all or selected pods on selected nodes:

  1. $ oc adm manage-node --selector=<node_selector> \
  2. --list-pods [--pod-selector=<pod_selector>] [-o json|yaml]

Marking nodes as unschedulable or schedulable

By default, healthy nodes with a Ready status are marked as schedulable, meaning that new pods are allowed for placement on the node. Manually marking a node as unschedulable blocks any new pods from being scheduled on the node. Existing pods on the node are not affected.

To mark a node or nodes as unschedulable:

  1. $ oc adm manage-node <node1> <node2> --schedulable=false

For example:

  1. $ oc adm manage-node node1.example.com --schedulable=false
  2. NAME LABELS STATUS
  3. node1.example.com kubernetes.io/hostname=node1.example.com Ready,SchedulingDisabled

To mark a currently unschedulable node or nodes as schedulable:

  1. $ oc adm manage-node <node1> <node2> --schedulable

Alternatively, instead of specifying specific node names (e.g., <node1> <node2>), you can use the --selector=<node_selector> option to mark selected nodes as schedulable or unschedulable.

Evacuating pods on nodes

Evacuating pods allows you to migrate all or selected pods from a given node. Nodes must first be marked unschedulable to perform pod evacuation.

Only pods backed by a replication controller can be evacuated; the replication controllers create new pods on other nodes and remove the existing pods from the specified node(s). Bare pods, meaning those not backed by a replication controller, are unaffected by default. You can evacuate a subset of pods by specifying a pod-selector. Pod selector is based on labels, so all the pods with the specified label will be evacuated.

To evacuate all or selected pods on a node:

  1. $ oc adm drain <node> [--pod-selector=<pod_selector>]

You can force deletion of bare pods by using the --force option. When set to true, deletion continues even if there are pods not managed by a replication controller, ReplicaSet, job, daemonset, or StatefulSet:

  1. $ oc adm drain <node> --force=true

You can use --grace-period to set a period of time in seconds for each pod to terminate gracefully. If negative, the default value specified in the pod is used:

  1. $ oc adm drain <node> --grace-period=-1

You can use --ignore-daemonsets and set it to true to ignore daemonset-managed pods:

  1. $ oc adm drain <node> --ignore-daemonsets=true

You can use --timeout to set the length of time to wait before giving up. A value of 0 sets an infinite length of time:

  1. $ oc adm drain <node> --timeout=5s

You can use --delete-local-data and set it to true to continue deletion even if there are pods using emptyDir (local data that is deleted when the node is drained):

  1. $ oc adm drain <node> --delete-local-data=true

To list objects that will be migrated without actually performing the evacuation, use the --dry-run option and set it to true:

  1. $ oc adm drain <node> --dry-run=true

Instead of specifying a specific node name, you can use the --selector=<node_selector> option to evacuate pods on nodes that match the selector.

Rebooting nodes

To reboot a node without causing an outage for applications running on the platform, it is important to first evacuate the pods. For pods that are made highly available by the routing tier, nothing else needs to be done. For other pods needing storage, typically databases, it is critical to ensure that they can remain in operation with one pod temporarily going offline. While implementing resiliency for stateful pods is different for each application, in all cases it is important to configure the scheduler to use node anti-affinity to ensure that the pods are properly spread across available nodes.

Another challenge is how to handle nodes that are running critical infrastructure such as the router or the registry. The same node evacuation process applies, though it is important to understand certain edge cases.

Infrastructure nodes

Infrastructure nodes are nodes that are labeled to run pieces of the OKD environment. Currently, the easiest way to manage node reboots is to ensure that there are at least three nodes available to run infrastructure. The scenario below demonstrates a common mistake that can lead to service interruptions for the applications running on OKD when only two nodes are available.

  • Node A is marked unschedulable and all pods are evacuated.

  • The registry pod running on that node is now redeployed on node B. This means node B is now running both registry pods.

  • Node B is now marked unschedulable and is evacuated.

  • The service exposing the two pod endpoints on node B, for a brief period of time, loses all endpoints until they are redeployed to node A.

The same process using three infrastructure nodes does not result in a service disruption. However, due to pod scheduling, the last node that is evacuated and brought back in to rotation is left running zero registries. The other two nodes will run two and one registries respectively. The best solution is to rely on pod anti-affinity. This is an alpha feature in Kubernetes that is available for testing now, but is not yet supported for production workloads.

Using pod anti-affinity

Pod anti-affinity is slightly different than node anti-affinity. Node anti-affinity can be violated if there are no other suitable locations to deploy a pod. Pod anti-affinity can be set to either required or preferred.

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: with-pod-antiaffinity
  5. spec:
  6. affinity:
  7. podAntiAffinity: (1)
  8. preferredDuringSchedulingIgnoredDuringExecution: (2)
  9. - weight: 100 (3)
  10. podAffinityTerm:
  11. labelSelector:
  12. matchExpressions:
  13. - key: docker-registry (4)
  14. operator: In (5)
  15. values:
  16. - default
  17. topologyKey: kubernetes.io/hostname
1Stanza to configure pod anti-affinity.
2Defines a preferred rule.
3Specifies a weight for a preferred rule. The node with the highest weight is preferred.
4Description of the pod label that determines when the anti-affinity rule applies. Specify a key and value for the label.
5The operator represents the relationship between the label on the existing pod and the set of values in the matchExpression parameters in the specification for the new pod. Can be In, NotIn, Exists, or DoesNotExist.

This example assumes the container image registry pod has a label of **docker-registry=default**. Pod anti-affinity can use any Kubernetes match expression.

The last required step is to enable the **MatchInterPodAffinity** scheduler predicate in /etc/origin/master/scheduler.json. With this in place, if only two infrastructure nodes are available and one is rebooted, the container image registry pod is prevented from running on the other node. **oc get pods** reports the pod as unready until a suitable node is available. Once a node is available and all pods are back in ready state, the next node can be restarted.

Handling nodes running routers

In most cases, a pod running an OKD router will expose a host port. The **PodFitsPorts** scheduler predicate ensures that no router pods using the same port can run on the same node, and pod anti-affinity is achieved. If the routers are relying on IP failover for high availability, there is nothing else that is needed. For router pods relying on an external service such as AWS Elastic Load Balancing for high availability, it is that service’s responsibility to react to router pod restarts.

In rare cases, a router pod might not have a host port configured. In those cases, it is important to follow the recommended restart process for infrastructure nodes.

Modifying Nodes

During installation, OKD creates a configmap in the openshift-node project for each type of node group:

  • node-config-master

  • node-config-infra

  • node-config-compute

  • node-config-all-in-one

  • node-config-master-infra

To make configuration changes to an existing node, edit the appropriate configuration map. A sync pod on each node watches for changes in the configuration maps. During installation, the sync pods are created by using sync Daemonsets, and a /etc/origin/node/node-config.yaml file, where the node configuration parameters reside, is added to each node. When a sync pod detects configuration map change, it updates the node-config.yaml on all nodes in that node group and restarts the atomic-openshift-node.service on the appropriate nodes.

  1. $ oc get cm -n openshift-node
  2. NAME DATA AGE
  3. node-config-all-in-one 1 1d
  4. node-config-compute 1 1d
  5. node-config-infra 1 1d
  6. node-config-master 1 1d
  7. node-config-master-infra 1 1d

Sample configuration map for the node-config-compute group

  1. apiVersion: v1
  2. authConfig: (1)
  3. authenticationCacheSize: 1000
  4. authenticationCacheTTL: 5m
  5. authorizationCacheSize: 1000
  6. authorizationCacheTTL: 5m
  7. dnsBindAddress: 127.0.0.1:53
  8. dnsDomain: cluster.local
  9. dnsIP: 0.0.0.0 (2)
  10. dnsNameservers: null
  11. dnsRecursiveResolvConf: /etc/origin/node/resolv.conf
  12. dockerConfig:
  13. dockerShimRootDirectory: /var/lib/dockershim
  14. dockerShimSocket: /var/run/dockershim.sock
  15. execHandlerName: native
  16. enableUnidling: true
  17. imageConfig:
  18. format: registry.reg-aws.openshift.com/openshift3/ose-${component}:${version}
  19. latest: false
  20. iptablesSyncPeriod: 30s
  21. kind: NodeConfig
  22. kubeletArguments: (3)
  23. bootstrap-kubeconfig:
  24. - /etc/origin/node/bootstrap.kubeconfig
  25. cert-dir:
  26. - /etc/origin/node/certificates
  27. cloud-config:
  28. - /etc/origin/cloudprovider/aws.conf
  29. cloud-provider:
  30. - aws
  31. enable-controller-attach-detach:
  32. - 'true'
  33. feature-gates:
  34. - RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true
  35. node-labels:
  36. - node-role.kubernetes.io/compute=true
  37. pod-manifest-path:
  38. - /etc/origin/node/pods (4)
  39. rotate-certificates:
  40. - 'true'
  41. masterClientConnectionOverrides:
  42. acceptContentTypes: application/vnd.kubernetes.protobuf,application/json
  43. burst: 40
  44. contentType: application/vnd.kubernetes.protobuf
  45. qps: 20
  46. masterKubeConfig: node.kubeconfig
  47. networkConfig: (5)
  48. mtu: 8951
  49. networkPluginName: redhat/openshift-ovs-subnet (6)
  50. servingInfo: (7)
  51. bindAddress: 0.0.0.0:10250
  52. bindNetwork: tcp4
  53. clientCA: client-ca.crt
  54. volumeConfig:
  55. localQuota:
  56. perFSGroup: null (8)
  57. volumeDirectory: /var/lib/origin/openshift.local.volumes
1Authentication and authorization configuration options.
2IP address prepended to a pod’s /etc/resolv.conf.
3Key value pairs that are passed directly to the Kubelet that match the Kubelet’s command line arguments.
4The path to the pod manifest file or directory. A directory must contain one or more manifest files. OKD uses the manifest files to create pods on the node.
5The pod network settings on the node.
6Software defined network (SDN) plug-in. Set to redhat/openshift-ovs-subnet for the ovs-subnet plug-in; redhat/openshift-ovs-multitenant for the ovs-multitenant plug-in; or redhat/openshift-ovs-networkpolicy for the ovs-networkpolicy plug-in.
7Certificate information for the node.
8Optional: PEM-encoded certificate bundle. If set, a valid client certificate must be presented and validated against the certificate authorities in the specified file before the request headers are checked for user names.

Do not manually modify the /etc/origin/node/node-config.yaml file.

Configuring Node Resources

You can configure node resources by adding kubelet arguments to the node configuration map.

  1. Edit the configuration map:

    1. $ oc edit cm node-config-compute -n openshift-node
  2. Add the **kubeletArguments** section and specify your options:

    1. kubeletArguments:
    2. max-pods: (1)
    3. - "40"
    4. resolv-conf: (2)
    5. - "/etc/resolv.conf"
    6. image-gc-high-threshold: (3)
    7. - "90"
    8. image-gc-low-threshold: (4)
    9. - "80"
    10. kube-api-qps: (5)
    11. - "20"
    12. kube-api-burst: (6)
    13. - "40"
    1Maximum number of pods that can run on this kubelet.
    2Resolver configuration file used as the basis for the container DNS resolution configuration.
    3The percent of disk usage after which image garbage collection is always run. Default: 90%
    4The percent of disk usage before which image garbage collection is never run. Lowest disk usage to garbage collect to. Default: 80%
    5The queries per second (QPS) to use while talking with the Kubernetes API server.
    6The burst to use while talking with the Kubernetes API server.

    To view all available kubelet options:

    1. $ hyperkube kubelet -h

Setting maximum pods per node

See the Cluster maximums page for the maximum supported limits for each version of OKD.

In the /etc/origin/node/node-config.yaml file, two parameters control the maximum number of pods that can be scheduled to a node: pods-per-core and max-pods. When both options are in use, the lower of the two limits the number of pods on a node. Exceeding these values can result in:

  • Increased CPU utilization on both OKD and Docker.

  • Slow pod scheduling.

  • Potential out-of-memory scenarios (depends on the amount of memory in the node).

  • Exhausting the pool of IP addresses.

  • Resource overcommitting, leading to poor user application performance.

In Kubernetes, a pod that is holding a single container actually uses two containers. The second container is used to set up networking prior to the actual container starting. Therefore, a system running 10 pods will actually have 20 containers running.

pods-per-core sets the number of pods the node can run based on the number of processor cores on the node. For example, if pods-per-core is set to 10 on a node with 4 processor cores, the maximum number of pods allowed on the node will be 40.

  1. kubeletArguments:
  2. pods-per-core:
  3. - "10"

Setting pods-per-core to 0 disables this limit.

max-pods sets the number of pods the node can run to a fixed value, regardless of the properties of the node. Cluster Limits documents maximum supported values for max-pods.

  1. kubeletArguments:
  2. max-pods:
  3. - "250"

Using the above example, the default value for pods-per-core is 10 and the default value for max-pods is 250. This means that unless the node has 25 cores or more, by default, pods-per-core will be the limiting factor.

Resetting Docker storage

As you download container images and run and delete containers, Docker does not always free up mapped disk space. As a result, over time you can run out of space on a node, which might prevent OKD from being able to create new pods or cause pod creation to take several minutes.

For example, the following shows pods that are still in the ContainerCreating state after six minutes and the events log shows a FailedSync event.

  1. $ oc get pod
  2. NAME READY STATUS RESTARTS AGE
  3. cakephp-mysql-persistent-1-build 0/1 ContainerCreating 0 6m
  4. mysql-1-9767d 0/1 ContainerCreating 0 2m
  5. mysql-1-deploy 0/1 ContainerCreating 0 6m
  6. $ oc get events
  7. LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
  8. 6m 6m 1 cakephp-mysql-persistent-1-build Pod Normal Scheduled default-scheduler Successfully assigned cakephp-mysql-persistent-1-build to ip-172-31-71-195.us-east-2.compute.internal
  9. 2m 5m 4 cakephp-mysql-persistent-1-build Pod Warning FailedSync kubelet, ip-172-31-71-195.us-east-2.compute.internal Error syncing pod
  10. 2m 4m 4 cakephp-mysql-persistent-1-build Pod Normal SandboxChanged kubelet, ip-172-31-71-195.us-east-2.compute.internal Pod sandbox changed, it will be killed and re-created.

One solution to this problem is to reset Docker storage to remove artifacts not needed by Docker.

On the node where you want to restart Docker storage:

  1. Run the following command to mark the node as unschedulable:

    1. $ oc adm manage-node <node> --schedulable=false
  2. Run the following command to shut down Docker and the atomic-openshift-node service:

    1. $ systemctl stop docker atomic-openshift-node
  3. Run the following command to remove the local volume directory:

    1. $ rm -rf /var/lib/origin/openshift.local.volumes

    This command clears the local image cache. As a result, images, including ose-* images, will need to be re-pulled. This might result in slower pod start times while the image store recovers.

  4. Remove the /var/lib/docker directory:

    1. $ rm -rf /var/lib/docker
  5. Run the following command to reset the Docker storage:

    1. $ docker-storage-setup --reset
  6. Run the following command to recreate the Docker storage:

    1. $ docker-storage-setup
  7. Recreate the /var/lib/docker directory:

    1. $ mkdir /var/lib/docker
  8. Run the following command to restart Docker and the atomic-openshift-node service:

    1. $ systemctl start docker atomic-openshift-node
  9. Restart the node service by rebooting the host:

    1. # systemctl restart atomic-openshift-node.service
  10. Run the following command to mark the node as schedulable:

    1. $ oc adm manage-node <node> --schedulable=true