Workload partitioning on single node OpenShift

In resource-constrained environments, such as single node production deployments, it is advantageous to reserve most of the CPU resources for your own workloads and configure OKD to run on a fixed number of CPUs within the host. In these environments, management workloads, including the control plane, need to be configured to use fewer resources than they might by default in normal clusters. You can isolate the OKD services, cluster management workloads, and infrastructure pods to run on a reserved set of CPUs.

When you use workload partitioning, the CPU resources used by OKD for cluster management are isolated to a partitioned set of CPU resources on a single node cluster. This partitioning isolates cluster management functions to the defined number of CPUs. All cluster management functions operate solely on that cpuset configuration.

The minimum number of reserved CPUs required for the management partition for a single node cluster is four CPU Hyper threads (HTs). The set of pods that make up the baseline OKD installation and a set of typical add-on Operators are annotated for inclusion in the management workload partition. These pods operate normally within the minimum size cpuset configuration. Inclusion of Operators or workloads outside of the set of accepted management pods requires additional CPU HTs to be added to that partition.

Workload partitioning isolates the user workloads away from the platform workloads using the normal scheduling capabilities of Kubernetes to manage the number of pods that can be placed onto those cores, and avoids mixing cluster management workloads and user workloads.

When using workload partitioning, you must install the Performance Addon Operator and apply the performance profile:

  • Workload partitioning pins the OKD infrastructure pods to a defined cpuset configuration.

  • The Performance Addon Operator performance profile pins the systemd services to a defined cpuset configuration.

  • This cpuset configuration must match.

Workload partitioning introduces a new extended resource of <workload-type>.workload.openshift.io/cores for each defined CPU pool, or workload-type. Kubelet advertises these new resources and CPU requests by pods allocated to the pool are accounted for within the corresponding resource rather than the typical cpu resource. When workload partitioning is enabled, the <workload-type>.workload.openshift.io/cores resource allows access to the CPU capacity of the host, not just the default CPU pool.

Enabling workload partitioning

Use the following procedure to enable workload partitioning for your single node deployments.

Procedure

  1. To enable workload partitioning, you must provide a MachineConfig manifest during installation to configure CRI-O and kubelet to know about the workload types. The following example shows a manifest without the encoded file content:

    1. apiVersion: machineconfiguration.openshift.io/v1
    2. kind: MachineConfig
    3. metadata:
    4. labels:
    5. machineconfiguration.openshift.io/role: master
    6. name: 02-master-workload-partitioning
    7. spec:
    8. config:
    9. ignition:
    10. version: 3.2.0
    11. storage:
    12. files:
    13. - contents:
    14. source: data:text/plain;charset=utf-8;base64,encoded-content-here
    15. mode: 420
    16. overwrite: true
    17. path: /etc/crio/crio.conf.d/01-workload-partitioning
    18. user:
    19. name: root
    20. - contents:
    21. source: data:text/plain;charset=utf-8;base64,encoded-content-here
    22. mode: 420
    23. overwrite: true
    24. path: /etc/kubernetes/openshift-workload-pinning
    25. user:
    26. name: root
  2. Provide the contents of /etc/crio/crio.conf.d/01-workload-partitioning as the workload partitioning encoded content. The cpuset value varies based on the deployment:

    1. cat <<EOF | base64 -w0
    2. [crio.runtime.workloads.management]
    3. activation_annotation = "target.workload.openshift.io/management"
    4. annotation_prefix = "resources.workload.openshift.io"
    5. resources = { "cpushares" = 0, "cpuset" = "0-1,52-53" }
    6. EOF
  3. Provide the contents of /etc/kubernetes/openshift-workload-pinning as the workload pinning encoded content. The cpuset value varies based on the deployment:

    1. cat <<EOF | base64 -w0
    2. {
    3. "management": {
    4. "cpuset": "0-1,52-53"
    5. }
    6. }
    7. EOF