Configuring the Linux cgroup version on your nodes

By default, OKD uses Linux control group version 2 (cgroup v2) in your cluster. You can switch to Linux control group version 1 (cgroup v1), if needed.

cgroup v2 is the next version of the kernel control group and offers multiple improvements. However, it can have some unwanted effects on your nodes.

Configuring Linux cgroup

You can switch to Linux control group version 1 (cgroup v1), if needed, by using a machine config. Enabling cgroup v1 in OKD disables the cgroup v2 controllers and hierarchies in your cluster.

Prerequisites

  • Have administrative privilege to a working OKD cluster.

Procedure

  1. Create a MachineConfig object file that identifies the kernel argument (for example, worker-cgroup-v1.yaml)

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: worker (1)
    6. name: worker-cgroup-v1 (2)
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.2.0
    11. kernelArguments:
    12. - systemd.unified_cgroup_hierarchy=0 (3)
    1Applies the new kernel argument only to worker nodes.
    2Applies a name to the machine config.
    3Configures cgroup v1 on the associated nodes.
  2. Create the new machine config:

    1. $ oc create -f 05-worker-cgroup-v1.yaml
  3. Check to see that the new machine config was added:

    1. $ oc get MachineConfig

    Example output

    1. NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE
    2. 00-master 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    3. 00-worker 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    4. 01-master-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    5. 01-master-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    6. 01-worker-container-runtime 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    7. 01-worker-kubelet 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    8. 99-worker-cgroup-v1 3.2.0 105s
    9. 99-master-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    10. 99-master-ssh 3.2.0 40m
    11. 99-worker-generated-registries 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    12. 99-worker-ssh 3.2.0 40m
    13. rendered-master-23e785de7587df95a4b517e0647e5ab7 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
    14. rendered-master-c5e92d98103061c4818cfcefcf462770 60746a843e7ef8855ae00f2ffcb655c53e0e8296 3.2.0 115s
    15. rendered-worker-5d596d9293ca3ea80c896a1191735bb1 52dd3ba6a9a527fc3ab42afac8d12b693534c8c9 3.2.0 33m
  4. Check the nodes:

    1. $ oc get nodes

    Example output

    1. NAME STATUS ROLES AGE VERSION
    2. ip-10-0-136-161.ec2.internal Ready worker 28m v1.27.3
    3. ip-10-0-136-243.ec2.internal Ready master 34m v1.27.3
    4. ip-10-0-141-105.ec2.internal Ready,SchedulingDisabled worker 28m v1.27.3
    5. ip-10-0-142-249.ec2.internal Ready master 34m v1.27.3
    6. ip-10-0-153-11.ec2.internal Ready worker 28m v1.27.3
    7. ip-10-0-153-150.ec2.internal Ready master 34m v1.27.3

    You can see that the command disables scheduling on each worker node.

  5. After a node returns to the Ready state, start a debug session for that node:

    1. $ oc debug node/<node_name>
  6. Set /host as the root directory within the debug shell:

    1. sh-4.4# chroot /host
  7. Check that the sys/fs/cgroup/cgroup2fs file has been moved to the tmpfs file system:

    1. $ stat -c %T -f /sys/fs/cgroup

    Example output

    1. tmpfs