Recommended single-node OpenShift cluster configuration for vDU application workloads

Use the following reference information to understand the single-node OpenShift configurations required to deploy virtual distributed unit (vDU) applications in the cluster. Configurations include cluster optimizations for high performance workloads, enabling workload partitioning, and minimizing the number of reboots required postinstallation.

Additional resources

Running low latency applications on OKD

OKD enables low latency processing for applications running on commercial off-the-shelf (COTS) hardware by using several technologies and specialized hardware devices:

Real-time kernel for RHCOS

Ensures workloads are handled with a high degree of process determinism.

CPU isolation

Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

NUMA-aware topology management

Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the non-uniform memory access (NUMA) node. Pod resources for all Quality of Service (QoS) classes stay on the same NUMA node. This decreases latency and improves performance of the node.

Huge pages memory management

Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

Precision timing synchronization using PTP

Allows synchronization between nodes in the network with sub-microsecond accuracy.

Running vDU application workloads requires a bare-metal host with sufficient resources to run OKD services and production workloads.

Table 1. Minimum resource requirements
ProfilevCPUMemoryStorage

Minimum

4 to 8 vCPU cores

32GB of RAM

120GB

One vCPU is equivalent to one physical core when simultaneous multithreading (SMT), or Hyper-Threading, is not enabled. When enabled, use the following formula to calculate the corresponding ratio:

  • (threads per core × cores) × sockets = vCPUs

The server must have a Baseboard Management Controller (BMC) when booting with virtual media.

Configuring host firmware for low latency and high performance

Bare-metal hosts require the firmware to be configured before the host can be provisioned. The firmware configuration is dependent on the specific hardware and the particular requirements of your installation.

Procedure

  1. Set the UEFI/BIOS Boot Mode to UEFI.

  2. In the host boot sequence order, set Hard drive first.

  3. Apply the specific firmware configuration for your hardware. The following table describes a representative firmware configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

    The exact firmware configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

    Table 2. Sample firmware configuration for an Intel Xeon Skylake or Cascade Lake server
    Firmware settingConfiguration

    CPU Power and Performance Policy

    Performance

    Uncore Frequency Scaling

    Disabled

    Performance P-limit

    Disabled

    Enhanced Intel SpeedStep ® Tech

    Enabled

    Intel Configurable TDP

    Enabled

    Configurable TDP Level

    Level 2

    Intel® Turbo Boost Technology

    Enabled

    Energy Efficient Turbo

    Disabled

    Hardware P-States

    Disabled

    Package C-State

    C0/C1 state

    C1E

    Disabled

    Processor C6

    Disabled

Enable global SR-IOV and VT-d settings in the firmware for the host. These settings are relevant to bare-metal environments.

Connectivity prerequisites for managed cluster networks

Before you can install and provision a managed cluster with the GitOps Zero Touch Provisioning (ZTP) pipeline, the managed cluster host must meet the following networking prerequisites:

  • There must be bi-directional connectivity between the GitOps ZTP container in the hub cluster and the Baseboard Management Controller (BMC) of the target bare-metal host.

  • The managed cluster must be able to resolve and reach the API hostname of the hub hostname and *.apps hostname. Here is an example of the API hostname of the hub and *.apps hostname:

    • api.hub-cluster.internal.domain.com

    • console-openshift-console.apps.hub-cluster.internal.domain.com

  • The hub cluster must be able to resolve and reach the API and *.apps hostname of the managed cluster. Here is an example of the API hostname of the managed cluster and *.apps hostname:

    • api.sno-managed-cluster-1.internal.domain.com

    • console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com

Workload partitioning in single-node OpenShift with GitOps ZTP

Workload partitioning configures OKD services, cluster management workloads, and infrastructure pods to run on a reserved number of host CPUs.

To configure workload partitioning with GitOps Zero Touch Provisioning (ZTP), you configure a cpuPartitioningMode field in the SiteConfig custom resource (CR) that you use to install the cluster and you apply a PerformanceProfile CR that configures the isolated and reserved CPUs on the host.

Configuring the SiteConfig CR enables workload partitioning at cluster installation time and applying the PerformanceProfile CR configures the specific allocation of CPUs to reserved and isolated sets. Both of these steps happen at different points during cluster provisioning.

Configuring workload partitioning by using the cpuPartitioningMode field in the SiteConfig CR is a Tech Preview feature in OKD 4.13.

Alternatively, you can specify cluster management CPU resources with the cpuset field of the SiteConfig custom resource (CR) and the reserved field of the group PolicyGenTemplate CR. The GitOps ZTP pipeline uses these values to populate the required fields in the workload partitioning MachineConfig CR (cpuset) and the PerformanceProfile CR (reserved) that configure the single-node OpenShift cluster. This method is a General Availability feature in OKD 4.14.

The workload partitioning configuration pins the OKD infrastructure pods to the reserved CPU set. Platform services such as systemd, CRI-O, and kubelet run on the reserved CPU set. The isolated CPU sets are exclusively allocated to your container workloads. Isolating CPUs ensures that the workload has guaranteed access to the specified CPUs without contention from other applications running on the same node. All CPUs that are not isolated should be reserved.

Ensure that reserved and isolated CPU sets do not overlap with each other.

Additional resources

  • For the recommended single-node OpenShift workload partitioning configuration, see Workload partitioning.

The ZTP pipeline applies the following custom resources (CRs) during cluster installation. These configuration CRs ensure that the cluster meets the feature and performance requirements necessary for running a vDU application.

When using the GitOps ZTP plugin and SiteConfig CRs for cluster deployment, the following MachineConfig CRs are included by default.

Use the SiteConfig extraManifests filter to alter the CRs that are included by default. For more information, see Advanced managed cluster configuration with SiteConfig CRs.

Workload partitioning

Single-node OpenShift clusters that run DU workloads require workload partitioning. This limits the cores allowed to run platform services, maximizing the CPU core for application payloads.

Workload partitioning can be enabled during cluster installation only. You cannot disable workload partitioning postinstallation. You can however change the set of CPUs assigned to the isolated and reserved sets through the PerformanceProfile CR. Changes to CPU settings cause the node to reboot.

Upgrading from OKD 4.12 to 4.13+

When transitioning to using cpuPartitioningMode for enabling workload partitioning, remove the workload partitioning MachineConfig CRs from the /extra-manifest folder that you use to provision the cluster.

Recommended SiteConfig CR configuration for workload partitioning

  1. apiVersion: ran.openshift.io/v1
  2. kind: SiteConfig
  3. metadata:
  4. name: "<site_name>"
  5. namespace: "<site_name>"
  6. spec:
  7. baseDomain: "example.com"
  8. cpuPartitioningMode: AllNodes (1)
1Set the cpuPartitioningMode field to AllNodes to configure workload partitioning for all nodes in the cluster.

Verification

Check that the applications and cluster system CPU pinning is correct. Run the following commands:

  1. Open a remote shell prompt to the managed cluster:

    1. $ oc debug node/example-sno-1
  2. Check that the user applications CPU pinning is correct:

    1. sh-4.4# pgrep ovn | while read i; do taskset -cp $i; done

    Example output

    1. pid 8481's current affinity list: 0-3
    2. pid 8726's current affinity list: 0-3
    3. pid 9088's current affinity list: 0-3
    4. pid 9945's current affinity list: 0-3
    5. pid 10387's current affinity list: 0-3
    6. pid 12123's current affinity list: 0-3
    7. pid 13313's current affinity list: 0-3
  3. Check that the system applications CPU pinning is correct:

    1. sh-4.4# pgrep systemd | while read i; do taskset -cp $i; done

    Example output

    1. pid 1's current affinity list: 0-3
    2. pid 938's current affinity list: 0-3
    3. pid 962's current affinity list: 0-3
    4. pid 1197's current affinity list: 0-3

Reduced platform management footprint

To reduce the overall management footprint of the platform, a MachineConfig custom resource (CR) is required that places all Kubernetes-specific mount points in a new namespace separate from the host operating system. The following base64-encoded example MachineConfig CR illustrates this configuration.

Recommended container mount namespace configuration (01-container-mount-ns-and-kubelet-conf-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: container-mount-namespace-and-kubelet-conf-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCmRlYnVnKCkgewogIGVjaG8gJEAgPiYyCn0KCnVzYWdlKCkgewogIGVjaG8gVXNhZ2U6ICQoYmFzZW5hbWUgJDApIFVOSVQgW2VudmZpbGUgW3Zhcm5hbWVdXQogIGVjaG8KICBlY2hvIEV4dHJhY3QgdGhlIGNvbnRlbnRzIG9mIHRoZSBmaXJzdCBFeGVjU3RhcnQgc3RhbnphIGZyb20gdGhlIGdpdmVuIHN5c3RlbWQgdW5pdCBhbmQgcmV0dXJuIGl0IHRvIHN0ZG91dAogIGVjaG8KICBlY2hvICJJZiAnZW52ZmlsZScgaXMgcHJvdmlkZWQsIHB1dCBpdCBpbiB0aGVyZSBpbnN0ZWFkLCBhcyBhbiBlbnZpcm9ubWVudCB2YXJpYWJsZSBuYW1lZCAndmFybmFtZSciCiAgZWNobyAiRGVmYXVsdCAndmFybmFtZScgaXMgRVhFQ1NUQVJUIGlmIG5vdCBzcGVjaWZpZWQiCiAgZXhpdCAxCn0KClVOSVQ9JDEKRU5WRklMRT0kMgpWQVJOQU1FPSQzCmlmIFtbIC16ICRVTklUIHx8ICRVTklUID09ICItLWhlbHAiIHx8ICRVTklUID09ICItaCIgXV07IHRoZW4KICB1c2FnZQpmaQpkZWJ1ZyAiRXh0cmFjdGluZyBFeGVjU3RhcnQgZnJvbSAkVU5JVCIKRklMRT0kKHN5c3RlbWN0bCBjYXQgJFVOSVQgfCBoZWFkIC1uIDEpCkZJTEU9JHtGSUxFI1wjIH0KaWYgW1sgISAtZiAkRklMRSBdXTsgdGhlbgogIGRlYnVnICJGYWlsZWQgdG8gZmluZCByb290IGZpbGUgZm9yIHVuaXQgJFVOSVQgKCRGSUxFKSIKICBleGl0CmZpCmRlYnVnICJTZXJ2aWNlIGRlZmluaXRpb24gaXMgaW4gJEZJTEUiCkVYRUNTVEFSVD0kKHNlZCAtbiAtZSAnL15FeGVjU3RhcnQ9LipcXCQvLC9bXlxcXSQvIHsgcy9eRXhlY1N0YXJ0PS8vOyBwIH0nIC1lICcvXkV4ZWNTdGFydD0uKlteXFxdJC8geyBzL15FeGVjU3RhcnQ9Ly87IHAgfScgJEZJTEUpCgppZiBbWyAkRU5WRklMRSBdXTsgdGhlbgogIFZBUk5BTUU9JHtWQVJOQU1FOi1FWEVDU1RBUlR9CiAgZWNobyAiJHtWQVJOQU1FfT0ke0VYRUNTVEFSVH0iID4gJEVOVkZJTEUKZWxzZQogIGVjaG8gJEVYRUNTVEFSVApmaQo=
  15. mode: 493
  16. path: /usr/local/bin/extractExecStart
  17. - contents:
  18. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKbnNlbnRlciAtLW1vdW50PS9ydW4vY29udGFpbmVyLW1vdW50LW5hbWVzcGFjZS9tbnQgIiRAIgo=
  19. mode: 493
  20. path: /usr/local/bin/nsenterCmns
  21. systemd:
  22. units:
  23. - contents: |
  24. [Unit]
  25. Description=Manages a mount namespace that both kubelet and crio can use to share their container-specific mounts
  26. [Service]
  27. Type=oneshot
  28. RemainAfterExit=yes
  29. RuntimeDirectory=container-mount-namespace
  30. Environment=RUNTIME_DIRECTORY=%t/container-mount-namespace
  31. Environment=BIND_POINT=%t/container-mount-namespace/mnt
  32. ExecStartPre=bash -c "findmnt ${RUNTIME_DIRECTORY} || mount --make-unbindable --bind ${RUNTIME_DIRECTORY} ${RUNTIME_DIRECTORY}"
  33. ExecStartPre=touch ${BIND_POINT}
  34. ExecStart=unshare --mount=${BIND_POINT} --propagation slave mount --make-rshared /
  35. ExecStop=umount -R ${RUNTIME_DIRECTORY}
  36. name: container-mount-namespace.service
  37. - dropins:
  38. - contents: |
  39. [Unit]
  40. Wants=container-mount-namespace.service
  41. After=container-mount-namespace.service
  42. [Service]
  43. ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
  44. EnvironmentFile=-/%t/%N-execstart.env
  45. ExecStart=
  46. ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
  47. ${ORIG_EXECSTART}"
  48. name: 90-container-mount-namespace.conf
  49. name: crio.service
  50. - dropins:
  51. - contents: |
  52. [Unit]
  53. Wants=container-mount-namespace.service
  54. After=container-mount-namespace.service
  55. [Service]
  56. ExecStartPre=/usr/local/bin/extractExecStart %n /%t/%N-execstart.env ORIG_EXECSTART
  57. EnvironmentFile=-/%t/%N-execstart.env
  58. ExecStart=
  59. ExecStart=bash -c "nsenter --mount=%t/container-mount-namespace/mnt \
  60. ${ORIG_EXECSTART} --housekeeping-interval=30s"
  61. name: 90-container-mount-namespace.conf
  62. - contents: |
  63. [Service]
  64. Environment="OPENSHIFT_MAX_HOUSEKEEPING_INTERVAL_DURATION=60s"
  65. Environment="OPENSHIFT_EVICTION_MONITORING_PERIOD_DURATION=30s"
  66. name: 30-kubelet-interval-tuning.conf
  67. name: kubelet.service

SCTP

Stream Control Transmission Protocol (SCTP) is a key protocol used in RAN applications. This MachineConfig object adds the SCTP kernel module to the node to enable this protocol.

Recommended control plane node SCTP configuration (03-sctp-machine-config-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: load-sctp-module-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 2.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:,
  15. verification: {}
  16. filesystem: root
  17. mode: 420
  18. path: /etc/modprobe.d/sctp-blacklist.conf
  19. - contents:
  20. source: data:text/plain;charset=utf-8,sctp
  21. filesystem: root
  22. mode: 420
  23. path: /etc/modules-load.d/sctp-load.conf

Recommended worker node SCTP configuration (03-sctp-machine-config-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: load-sctp-module-worker
  7. spec:
  8. config:
  9. ignition:
  10. version: 2.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:,
  15. verification: {}
  16. filesystem: root
  17. mode: 420
  18. path: /etc/modprobe.d/sctp-blacklist.conf
  19. - contents:
  20. source: data:text/plain;charset=utf-8,sctp
  21. filesystem: root
  22. mode: 420
  23. path: /etc/modules-load.d/sctp-load.conf

Accelerated container startup

The following MachineConfig CR configures core OpenShift processes and containers to use all available CPU cores during system startup and shutdown. This accelerates the system recovery during initial boot and reboots.

Recommended accelerated container startup configuration (04-accelerated-container-startup-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 04-accelerated-container-startup-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,#!/bin/bash
#
# Temporarily reset the core system processes's CPU affinity to be unrestricted to accelerate startup and shutdown
#
# The defaults below can be overridden via environment variables
#

# The default set of critical processes whose affinity should be temporarily unbound:
CRITICAL_PROCESSES=${CRITICAL_PROCESSES:-"crio kubelet NetworkManager conmon dbus"}

# Default wait time is 600s = 10m:
MAXIMUM_WAIT_TIME=${MAXIMUM_WAIT_TIME:-600}

# Default steady-state threshold = 2%
# Allowed values:
#  4  - absolute pod count (+/-)
#  4% - percent change (+/-)
#  -1 - disable the steady-state check
STEADY_STATE_THRESHOLD=${STEADY_STATE_THRESHOLD:-2%}

# Default steady-state window = 60s
# If the running pod count stays within the given threshold for this time
# period, return CPU utilization to normal before the maximum wait time has
# expires
STEADY_STATE_WINDOW=${STEADY_STATE_WINDOW:-60}

# Default steady-state allows any pod count to be "steady state"
# Increasing this will skip any steady-state checks until the count rises above
# this number to avoid false positives if there are some periods where the
# count doesn't increase but we know we can't be at steady-state yet.
STEADY_STATE_MINIMUM=${STEADY_STATE_MINIMUM:-0}

#######################################################

KUBELET_CPU_STATE=/var/lib/kubelet/cpu_manager_state
FULL_CPU_STATE=/sys/fs/cgroup/cpuset/cpuset.cpus
KUBELET_CONF=/etc/kubernetes/kubelet.conf
unrestrictedCpuset() {
  local cpus
  if [[ -e $KUBELET_CPU_STATE ]]; then
    cpus=$(jq -r '.defaultCpuSet' <$KUBELET_CPU_STATE)
    if [[ -n "${cpus}" && -e ${KUBELET_CONF} ]]; then
      reserved_cpus=$(jq -r '.reservedSystemCPUs' </etc/kubernetes/kubelet.conf)
      if [[ -n "${reserved_cpus}" ]]; then
        # Use taskset to merge the two cpusets
        cpus=$(taskset -c "${reserved_cpus},${cpus}" grep -i Cpus_allowed_list /proc/self/status | awk '{print $2}')
      fi
    fi
  fi
  if [[ -z $cpus ]]; then
    # fall back to using all cpus if the kubelet state is not configured yet
    [[ -e $FULL_CPU_STATE ]] || return 1
    cpus=$(<$FULL_CPU_STATE)
  fi
  echo $cpus
}

restrictedCpuset() {
  for arg in $(</proc/cmdline); do
    if [[ $arg =~ ^systemd.cpu_affinity= ]]; then
      echo ${arg#*=}
      return 0
    fi
  done
  return 1
}

resetAffinity() {
  local cpuset="$1"
  local failcount=0
  local successcount=0
  logger "Recovery: Setting CPU affinity for critical processes \"$CRITICAL_PROCESSES\" to $cpuset"
  for proc in $CRITICAL_PROCESSES; do
    local pids="$(pgrep $proc)"
    for pid in $pids; do
      local tasksetOutput
      tasksetOutput="$(taskset -apc "$cpuset" $pid 2>&1)"
      if [[ $? -ne 0 ]]; then
        echo "ERROR: $tasksetOutput"
        ((failcount++))
      else
        ((successcount++))
      fi
    done
  done

  logger "Recovery: Re-affined $successcount pids successfully"
  if [[ $failcount -gt 0 ]]; then
    logger "Recovery: Failed to re-affine $failcount processes"
    return 1
  fi
}

setUnrestricted() {
  logger "Recovery: Setting critical system processes to have unrestricted CPU access"
  resetAffinity "$(unrestrictedCpuset)"
}

setRestricted() {
  logger "Recovery: Resetting critical system processes back to normally restricted access"
  resetAffinity "$(restrictedCpuset)"
}

currentAffinity() {
  local pid="$1"
  taskset -pc $pid | awk -F': ' '{print $2}'
}

within() {
  local last=$1 current=$2 threshold=$3
  local delta=0 pchange
  delta=$(( current - last ))
  if [[ $current -eq $last ]]; then
    pchange=0
  elif [[ $last -eq 0 ]]; then
    pchange=1000000
  else
    pchange=$(( ( $delta * 100) / last ))
  fi
  echo -n "last:$last current:$current delta:$delta pchange:${pchange}%: "
  local absolute limit
  case $threshold in
    *%)
      absolute=${pchange##-} # absolute value
      limit=${threshold%%%}
      ;;
    *)
      absolute=${delta##-} # absolute value
      limit=$threshold
      ;;
  esac
  if [[ $absolute -le $limit ]]; then
    echo "within (+/-)$threshold"
    return 0
  else
    echo "outside (+/-)$threshold"
    return 1
  fi
}

steadystate() {
  local last=$1 current=$2
  if [[ $last -lt $STEADY_STATE_MINIMUM ]]; then
    echo "last:$last current:$current Waiting to reach $STEADY_STATE_MINIMUM before checking for steady-state"
    return 1
  fi
  within $last $current $STEADY_STATE_THRESHOLD
}

waitForReady() {
  logger "Recovery: Waiting ${MAXIMUM_WAIT_TIME}s for the initialization to complete"
  local lastSystemdCpuset="$(currentAffinity 1)"
  local lastDesiredCpuset="$(unrestrictedCpuset)"
  local t=0 s=10
  local lastCcount=0 ccount=0 steadyStateTime=0
  while [[ $t -lt $MAXIMUM_WAIT_TIME ]]; do
    sleep $s
    ((t += s))
    # Re-check the current affinity of systemd, in case some other process has changed it
    local systemdCpuset="$(currentAffinity 1)"
    # Re-check the unrestricted Cpuset, as the allowed set of unreserved cores may change as pods are assigned to cores
    local desiredCpuset="$(unrestrictedCpuset)"
    if [[ $systemdCpuset != $lastSystemdCpuset || $lastDesiredCpuset != $desiredCpuset ]]; then
      resetAffinity "$desiredCpuset"
      lastSystemdCpuset="$(currentAffinity 1)"
      lastDesiredCpuset="$desiredCpuset"
    fi

    # Detect steady-state pod count
    ccount=$(crictl ps | wc -l)
    if steadystate $lastCcount $ccount; then
      ((steadyStateTime += s))
      echo "Steady-state for ${steadyStateTime}s/${STEADY_STATE_WINDOW}s"
      if [[ $steadyStateTime -ge $STEADY_STATE_WINDOW ]]; then
        logger "Recovery: Steady-state (+/- $STEADY_STATE_THRESHOLD) for ${STEADY_STATE_WINDOW}s: Done"
        return 0
      fi
    else
      if [[ $steadyStateTime -gt 0 ]]; then
        echo "Resetting steady-state timer"
        steadyStateTime=0
      fi
    fi
    lastCcount=$ccount
  done
  logger "Recovery: Recovery Complete Timeout"
}

main() {
  if ! unrestrictedCpuset >&/dev/null; then
    logger "Recovery: No unrestricted Cpuset could be detected"
    return 1
  fi

  if ! restrictedCpuset >&/dev/null; then
    logger "Recovery: No restricted Cpuset has been configured.  We are already running unrestricted."
    return 0
  fi

  # Ensure we reset the CPU affinity when we exit this script for any reason
  # This way either after the timer expires or after the process is interrupted
  # via ^C or SIGTERM, we return things back to the way they should be.
  trap setRestricted EXIT

  logger "Recovery: Recovery Mode Starting"
  setUnrestricted
  waitForReady
}

if [[ "${BASH_SOURCE[0]}" = "${0}" ]]; then
  main "${@}"
  exit $?
fi

  15. mode: 493
  16. path: /usr/local/bin/accelerated-container-startup.sh
  17. systemd:
  18. units:
  19. - contents: |
  20. [Unit]
  21. Description=Unlocks more CPUs for critical system processes during container startup
  22. [Service]
  23. Type=simple
  24. ExecStart=/usr/local/bin/accelerated-container-startup.sh
  25. # Maximum wait time is 600s = 10m:
  26. Environment=MAXIMUM_WAIT_TIME=600
  27. # Steady-state threshold = 2%
  28. # Allowed values:
  29. # 4 - absolute pod count (+/-)
  30. # 4% - percent change (+/-)
  31. # -1 - disable the steady-state check
  32. # Note: '%' must be escaped as '%%' in systemd unit files
  33. Environment=STEADY_STATE_THRESHOLD=2%%
  34. # Steady-state window = 120s
  35. # If the running pod count stays within the given threshold for this time
  36. # period, return CPU utilization to normal before the maximum wait time has
  37. # expires
  38. Environment=STEADY_STATE_WINDOW=120
  39. # Steady-state minimum = 40
  40. # Increasing this will skip any steady-state checks until the count rises above
  41. # this number to avoid false positives if there are some periods where the
  42. # count doesn't increase but we know we can't be at steady-state yet.
  43. Environment=STEADY_STATE_MINIMUM=40
  44. [Install]
  45. WantedBy=multi-user.target
  46. enabled: true
  47. name: accelerated-container-startup.service
  48. - contents: |
  49. [Unit]
  50. Description=Unlocks more CPUs for critical system processes during container shutdown
  51. DefaultDependencies=no
  52. [Service]
  53. Type=simple
  54. ExecStart=/usr/local/bin/accelerated-container-startup.sh
  55. # Maximum wait time is 600s = 10m:
  56. Environment=MAXIMUM_WAIT_TIME=600
  57. # Steady-state threshold
  58. # Allowed values:
  59. # 4 - absolute pod count (+/-)
  60. # 4% - percent change (+/-)
  61. # -1 - disable the steady-state check
  62. # Note: '%' must be escaped as '%%' in systemd unit files
  63. Environment=STEADY_STATE_THRESHOLD=-1
  64. # Steady-state window = 60s
  65. # If the running pod count stays within the given threshold for this time
  66. # period, return CPU utilization to normal before the maximum wait time has
  67. # expires
  68. Environment=STEADY_STATE_WINDOW=60
  69. [Install]
  70. WantedBy=shutdown.target reboot.target halt.target
  71. enabled: true
  72. name: accelerated-container-shutdown.service

Setting rcu_normal

The following MachineConfig CR configures the system to set rcu_normal to 1 after the system has finished startup. This improves kernel latency for vDU applications.

Recommended configuration for disabling rcu_expedited after the node has finished startup (08-set-rcu-normal-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 08-set-rcu-normal-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKIwojIERpc2FibGUgcmN1X2V4cGVkaXRlZCBhZnRlciBub2RlIGhhcyBmaW5pc2hlZCBib290aW5nCiMKIyBUaGUgZGVmYXVsdHMgYmVsb3cgY2FuIGJlIG92ZXJyaWRkZW4gdmlhIGVudmlyb25tZW50IHZhcmlhYmxlcwojCgojIERlZmF1bHQgd2FpdCB0aW1lIGlzIDYwMHMgPSAxMG06Ck1BWElNVU1fV0FJVF9USU1FPSR7TUFYSU1VTV9XQUlUX1RJTUU6LTYwMH0KCiMgRGVmYXVsdCBzdGVhZHktc3RhdGUgdGhyZXNob2xkID0gMiUKIyBBbGxvd2VkIHZhbHVlczoKIyAgNCAgLSBhYnNvbHV0ZSBwb2QgY291bnQgKCsvLSkKIyAgNCUgLSBwZXJjZW50IGNoYW5nZSAoKy8tKQojICAtMSAtIGRpc2FibGUgdGhlIHN0ZWFkeS1zdGF0ZSBjaGVjawpTVEVBRFlfU1RBVEVfVEhSRVNIT0xEPSR7U1RFQURZX1NUQVRFX1RIUkVTSE9MRDotMiV9CgojIERlZmF1bHQgc3RlYWR5LXN0YXRlIHdpbmRvdyA9IDYwcwojIElmIHRoZSBydW5uaW5nIHBvZCBjb3VudCBzdGF5cyB3aXRoaW4gdGhlIGdpdmVuIHRocmVzaG9sZCBmb3IgdGhpcyB0aW1lCiMgcGVyaW9kLCByZXR1cm4gQ1BVIHV0aWxpemF0aW9uIHRvIG5vcm1hbCBiZWZvcmUgdGhlIG1heGltdW0gd2FpdCB0aW1lIGhhcwojIGV4cGlyZXMKU1RFQURZX1NUQVRFX1dJTkRPVz0ke1NURUFEWV9TVEFURV9XSU5ET1c6LTYwfQoKIyBEZWZhdWx0IHN0ZWFkeS1zdGF0ZSBhbGxvd3MgYW55IHBvZCBjb3VudCB0byBiZSAic3RlYWR5IHN0YXRlIgojIEluY3JlYXNpbmcgdGhpcyB3aWxsIHNraXAgYW55IHN0ZWFkeS1zdGF0ZSBjaGVja3MgdW50aWwgdGhlIGNvdW50IHJpc2VzIGFib3ZlCiMgdGhpcyBudW1iZXIgdG8gYXZvaWQgZmFsc2UgcG9zaXRpdmVzIGlmIHRoZXJlIGFyZSBzb21lIHBlcmlvZHMgd2hlcmUgdGhlCiMgY291bnQgZG9lc24ndCBpbmNyZWFzZSBidXQgd2Uga25vdyB3ZSBjYW4ndCBiZSBhdCBzdGVhZHktc3RhdGUgeWV0LgpTVEVBRFlfU1RBVEVfTUlOSU1VTT0ke1NURUFEWV9TVEFURV9NSU5JTVVNOi0wfQoKIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIwoKd2l0aGluKCkgewogIGxvY2FsIGxhc3Q9JDEgY3VycmVudD0kMiB0aHJlc2hvbGQ9JDMKICBsb2NhbCBkZWx0YT0wIHBjaGFuZ2UKICBkZWx0YT0kKCggY3VycmVudCAtIGxhc3QgKSkKICBpZiBbWyAkY3VycmVudCAtZXEgJGxhc3QgXV07IHRoZW4KICAgIHBjaGFuZ2U9MAogIGVsaWYgW1sgJGxhc3QgLWVxIDAgXV07IHRoZW4KICAgIHBjaGFuZ2U9MTAwMDAwMAogIGVsc2UKICAgIHBjaGFuZ2U9JCgoICggIiRkZWx0YSIgKiAxMDApIC8gbGFzdCApKQogIGZpCiAgZWNobyAtbiAibGFzdDokbGFzdCBjdXJyZW50OiRjdXJyZW50IGRlbHRhOiRkZWx0YSBwY2hhbmdlOiR7cGNoYW5nZX0lOiAiCiAgbG9jYWwgYWJzb2x1dGUgbGltaXQKICBjYXNlICR0aHJlc2hvbGQgaW4KICAgIColKQogICAgICBhYnNvbHV0ZT0ke3BjaGFuZ2UjIy19ICMgYWJzb2x1dGUgdmFsdWUKICAgICAgbGltaXQ9JHt0aHJlc2hvbGQlJSV9CiAgICAgIDs7CiAgICAqKQogICAgICBhYnNvbHV0ZT0ke2RlbHRhIyMtfSAjIGFic29sdXRlIHZhbHVlCiAgICAgIGxpbWl0PSR0aHJlc2hvbGQKICAgICAgOzsKICBlc2FjCiAgaWYgW1sgJGFic29sdXRlIC1sZSAkbGltaXQgXV07IHRoZW4KICAgIGVjaG8gIndpdGhpbiAoKy8tKSR0aHJlc2hvbGQiCiAgICByZXR1cm4gMAogIGVsc2UKICAgIGVjaG8gIm91dHNpZGUgKCsvLSkkdGhyZXNob2xkIgogICAgcmV0dXJuIDEKICBmaQp9CgpzdGVhZHlzdGF0ZSgpIHsKICBsb2NhbCBsYXN0PSQxIGN1cnJlbnQ9JDIKICBpZiBbWyAkbGFzdCAtbHQgJFNURUFEWV9TVEFURV9NSU5JTVVNIF1dOyB0aGVuCiAgICBlY2hvICJsYXN0OiRsYXN0IGN1cnJlbnQ6JGN1cnJlbnQgV2FpdGluZyB0byByZWFjaCAkU1RFQURZX1NUQVRFX01JTklNVU0gYmVmb3JlIGNoZWNraW5nIGZvciBzdGVhZHktc3RhdGUiCiAgICByZXR1cm4gMQogIGZpCiAgd2l0aGluICIkbGFzdCIgIiRjdXJyZW50IiAiJFNURUFEWV9TVEFURV9USFJFU0hPTEQiCn0KCndhaXRGb3JSZWFkeSgpIHsKICBsb2dnZXIgIlJlY292ZXJ5OiBXYWl0aW5nICR7TUFYSU1VTV9XQUlUX1RJTUV9cyBmb3IgdGhlIGluaXRpYWxpemF0aW9uIHRvIGNvbXBsZXRlIgogIGxvY2FsIHQ9MCBzPTEwCiAgbG9jYWwgbGFzdENjb3VudD0wIGNjb3VudD0wIHN0ZWFkeVN0YXRlVGltZT0wCiAgd2hpbGUgW1sgJHQgLWx0ICRNQVhJTVVNX1dBSVRfVElNRSBdXTsgZG8KICAgIHNsZWVwICRzCiAgICAoKHQgKz0gcykpCiAgICAjIERldGVjdCBzdGVhZHktc3RhdGUgcG9kIGNvdW50CiAgICBjY291bnQ9JChjcmljdGwgcHMgMj4vZGV2L251bGwgfCB3YyAtbCkKICAgIGlmIFtbICRjY291bnQgLWd0IDAgXV0gJiYgc3RlYWR5c3RhdGUgIiRsYXN0Q2NvdW50IiAiJGNjb3VudCI7IHRoZW4KICAgICAgKChzdGVhZHlTdGF0ZVRpbWUgKz0gcykpCiAgICAgIGVjaG8gIlN0ZWFkeS1zdGF0ZSBmb3IgJHtzdGVhZHlTdGF0ZVRpbWV9cy8ke1NURUFEWV9TVEFURV9XSU5ET1d9cyIKICAgICAgaWYgW1sgJHN0ZWFkeVN0YXRlVGltZSAtZ2UgJFNURUFEWV9TVEFURV9XSU5ET1cgXV07IHRoZW4KICAgICAgICBsb2dnZXIgIlJlY292ZXJ5OiBTdGVhZHktc3RhdGUgKCsvLSAkU1RFQURZX1NUQVRFX1RIUkVTSE9MRCkgZm9yICR7U1RFQURZX1NUQVRFX1dJTkRPV31zOiBEb25lIgogICAgICAgIHJldHVybiAwCiAgICAgIGZpCiAgICBlbHNlCiAgICAgIGlmIFtbICRzdGVhZHlTdGF0ZVRpbWUgLWd0IDAgXV07IHRoZW4KICAgICAgICBlY2hvICJSZXNldHRpbmcgc3RlYWR5LXN0YXRlIHRpbWVyIgogICAgICAgIHN0ZWFkeVN0YXRlVGltZT0wCiAgICAgIGZpCiAgICBmaQogICAgbGFzdENjb3VudD0kY2NvdW50CiAgZG9uZQogIGxvZ2dlciAiUmVjb3Zlcnk6IFJlY292ZXJ5IENvbXBsZXRlIFRpbWVvdXQiCn0KCnNldFJjdU5vcm1hbCgpIHsKICBlY2hvICJTZXR0aW5nIHJjdV9ub3JtYWwgdG8gMSIKICBlY2hvIDEgPiAvc3lzL2tlcm5lbC9yY3Vfbm9ybWFsCn0KCm1haW4oKSB7CiAgd2FpdEZvclJlYWR5CiAgZWNobyAiV2FpdGluZyBmb3Igc3RlYWR5IHN0YXRlIHRvb2s6ICQoYXdrICd7cHJpbnQgaW50KCQxLzM2MDApImgiLCBpbnQoKCQxJTM2MDApLzYwKSJtIiwgaW50KCQxJTYwKSJzIn0nIC9wcm9jL3VwdGltZSkiCiAgc2V0UmN1Tm9ybWFsCn0KCmlmIFtbICIke0JBU0hfU09VUkNFWzBdfSIgPSAiJHswfSIgXV07IHRoZW4KICBtYWluICIke0B9IgogIGV4aXQgJD8KZmkK
  15. mode: 493
  16. path: /usr/local/bin/set-rcu-normal.sh
  17. systemd:
  18. units:
  19. - contents: |
  20. [Unit]
  21. Description=Disable rcu_expedited after node has finished booting by setting rcu_normal to 1
  22. [Service]
  23. Type=simple
  24. ExecStart=/usr/local/bin/set-rcu-normal.sh
  25. # Maximum wait time is 600s = 10m:
  26. Environment=MAXIMUM_WAIT_TIME=600
  27. # Steady-state threshold = 2%
  28. # Allowed values:
  29. # 4 - absolute pod count (+/-)
  30. # 4% - percent change (+/-)
  31. # -1 - disable the steady-state check
  32. # Note: '%' must be escaped as '%%' in systemd unit files
  33. Environment=STEADY_STATE_THRESHOLD=2%%
  34. # Steady-state window = 120s
  35. # If the running pod count stays within the given threshold for this time
  36. # period, return CPU utilization to normal before the maximum wait time has
  37. # expires
  38. Environment=STEADY_STATE_WINDOW=120
  39. # Steady-state minimum = 40
  40. # Increasing this will skip any steady-state checks until the count rises above
  41. # this number to avoid false positives if there are some periods where the
  42. # count doesn't increase but we know we can't be at steady-state yet.
  43. Environment=STEADY_STATE_MINIMUM=40
  44. [Install]
  45. WantedBy=multi-user.target
  46. enabled: true
  47. name: set-rcu-normal.service

Automatic kernel crash dumps with kdump

kdump is a Linux kernel feature that creates a kernel crash dump when the kernel crashes. kdump is enabled with the following MachineConfig CRs.

Recommended MachineConfig CR to remove ice driver from control plane kdump logs (05-kdump-config-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 05-kdump-config-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - enabled: true
  14. name: kdump-remove-ice-module.service
  15. contents: |
  16. [Unit]
  17. Description=Remove ice module when doing kdump
  18. Before=kdump.service
  19. [Service]
  20. Type=oneshot
  21. RemainAfterExit=true
  22. ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
  23. [Install]
  24. WantedBy=multi-user.target
  25. storage:
  26. files:
  27. - contents:
  28. source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
  29. mode: 448
  30. path: /usr/local/bin/kdump-remove-ice-module.sh

Recommended control plane node kdump configuration (06-kdump-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 06-kdump-enable-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - enabled: true
  14. name: kdump.service
  15. kernelArguments:
  16. - crashkernel=512M

Recommended MachineConfig CR to remove ice driver from worker node kdump logs (05-kdump-config-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: 05-kdump-config-worker
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - enabled: true
  14. name: kdump-remove-ice-module.service
  15. contents: |
  16. [Unit]
  17. Description=Remove ice module when doing kdump
  18. Before=kdump.service
  19. [Service]
  20. Type=oneshot
  21. RemainAfterExit=true
  22. ExecStart=/usr/local/bin/kdump-remove-ice-module.sh
  23. [Install]
  24. WantedBy=multi-user.target
  25. storage:
  26. files:
  27. - contents:
  28. source: data:text/plain;charset=utf-8;base64,IyEvdXNyL2Jpbi9lbnYgYmFzaAoKIyBUaGlzIHNjcmlwdCByZW1vdmVzIHRoZSBpY2UgbW9kdWxlIGZyb20ga2R1bXAgdG8gcHJldmVudCBrZHVtcCBmYWlsdXJlcyBvbiBjZXJ0YWluIHNlcnZlcnMuCiMgVGhpcyBpcyBhIHRlbXBvcmFyeSB3b3JrYXJvdW5kIGZvciBSSEVMUExBTi0xMzgyMzYgYW5kIGNhbiBiZSByZW1vdmVkIHdoZW4gdGhhdCBpc3N1ZSBpcwojIGZpeGVkLgoKc2V0IC14CgpTRUQ9Ii91c3IvYmluL3NlZCIKR1JFUD0iL3Vzci9iaW4vZ3JlcCIKCiMgb3ZlcnJpZGUgZm9yIHRlc3RpbmcgcHVycG9zZXMKS0RVTVBfQ09ORj0iJHsxOi0vZXRjL3N5c2NvbmZpZy9rZHVtcH0iClJFTU9WRV9JQ0VfU1RSPSJtb2R1bGVfYmxhY2tsaXN0PWljZSIKCiMgZXhpdCBpZiBmaWxlIGRvZXNuJ3QgZXhpc3QKWyAhIC1mICR7S0RVTVBfQ09ORn0gXSAmJiBleGl0IDAKCiMgZXhpdCBpZiBmaWxlIGFscmVhZHkgdXBkYXRlZAoke0dSRVB9IC1GcSAke1JFTU9WRV9JQ0VfU1RSfSAke0tEVU1QX0NPTkZ9ICYmIGV4aXQgMAoKIyBUYXJnZXQgbGluZSBsb29rcyBzb21ldGhpbmcgbGlrZSB0aGlzOgojIEtEVU1QX0NPTU1BTkRMSU5FX0FQUEVORD0iaXJxcG9sbCBucl9jcHVzPTEgLi4uIGhlc3RfZGlzYWJsZSIKIyBVc2Ugc2VkIHRvIG1hdGNoIGV2ZXJ5dGhpbmcgYmV0d2VlbiB0aGUgcXVvdGVzIGFuZCBhcHBlbmQgdGhlIFJFTU9WRV9JQ0VfU1RSIHRvIGl0CiR7U0VEfSAtaSAncy9eS0RVTVBfQ09NTUFORExJTkVfQVBQRU5EPSJbXiJdKi8mICcke1JFTU9WRV9JQ0VfU1RSfScvJyAke0tEVU1QX0NPTkZ9IHx8IGV4aXQgMAo=
  29. mode: 448
  30. path: /usr/local/bin/kdump-remove-ice-module.sh

Recommended kdump worker node configuration (06-kdump-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: 06-kdump-enable-worker
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - enabled: true
  14. name: kdump.service
  15. kernelArguments:
  16. - crashkernel=512M

Disable automatic CRI-O cache wipe

After an uncontrolled host shutdown or cluster reboot, CRI-O automatically deletes the entire CRI-O cache, causing all images to be pulled from the registry when the node reboots. This can result in unacceptably slow recovery times or recovery failures. To prevent this from happening in single-node OpenShift clusters that you install with GitOps ZTP, disable the CRI-O delete cache feature during cluster installation.

Recommended MachineConfig CR to disable CRI-O cache wipe on control plane nodes (99-crio-disable-wipe-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 99-crio-disable-wipe-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
  15. mode: 420
  16. path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

Recommended MachineConfig CR to disable CRI-O cache wipe on worker nodes (99-crio-disable-wipe-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: 99-crio-disable-wipe-worker
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. storage:
  12. files:
  13. - contents:
  14. source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmNsZWFuX3NodXRkb3duX2ZpbGUgPSAiIgo=
  15. mode: 420
  16. path: /etc/crio/crio.conf.d/99-crio-disable-wipe.toml

Configuring crun as the default container runtime

The following ContainerRuntimeConfig custom resources (CRs) configure crun as the default OCI container runtime for control plane and worker nodes. The crun container runtime is fast and lightweight and has a low memory footprint.

For optimal performance, enable crun for control plane and worker nodes in single-node OpenShift, three-node OpenShift, and standard clusters. To avoid the cluster rebooting when the CR is applied, apply the change as a GitOps ZTP additional Day 0 install-time manifest.

Recommended ContainerRuntimeConfig CR for control plane nodes (enable-crun-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: ContainerRuntimeConfig
  3. metadata:
  4. name: enable-crun-master
  5. spec:
  6. machineConfigPoolSelector:
  7. matchLabels:
  8. pools.operator.machineconfiguration.openshift.io/master: ""
  9. containerRuntimeConfig:
  10. defaultRuntime: crun

Recommended ContainerRuntimeConfig CR for worker nodes (enable-crun-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: ContainerRuntimeConfig
  3. metadata:
  4. name: enable-crun-worker
  5. spec:
  6. machineConfigPoolSelector:
  7. matchLabels:
  8. pools.operator.machineconfiguration.openshift.io/worker: ""
  9. containerRuntimeConfig:
  10. defaultRuntime: crun

When the cluster installation is complete, the ZTP pipeline applies the following custom resources (CRs) that are required to run DU workloads.

In GitOps ZTP v4.10 and earlier, you configure UEFI secure boot with a MachineConfig CR. This is no longer required in GitOps ZTP v4.11 and later. In v4.11, you configure UEFI secure boot for single-node OpenShift clusters by updating the spec.clusters.nodes.bootMode field in the SiteConfig CR that you use to install the cluster. For more information, see Deploying a managed cluster with SiteConfig and GitOps ZTP.

Operators

Single-node OpenShift clusters that run DU workloads require the following Operators to be installed:

  • Local Storage Operator

  • Logging Operator

  • PTP Operator

  • SR-IOV Network Operator

You also need to configure a custom CatalogSource CR, disable the default OperatorHub configuration, and configure an ImageContentSourcePolicy mirror registry that is accessible from the clusters that you install.

Recommended Storage Operator namespace and Operator group configuration (StorageNS.yaml, StorageOperGroup.yaml)

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-local-storage
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: openshift-local-storage
  13. namespace: openshift-local-storage
  14. annotations: {}
  15. spec:
  16. targetNamespaces:
  17. - openshift-local-storage

Recommended Cluster Logging Operator namespace and Operator group configuration (ClusterLogNS.yaml, ClusterLogOperGroup.yaml)

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-logging
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: cluster-logging
  13. namespace: openshift-logging
  14. annotations: {}
  15. spec:
  16. targetNamespaces:
  17. - openshift-logging

Recommended PTP Operator namespace and Operator group configuration (PtpSubscriptionNS.yaml, PtpSubscriptionOperGroup.yaml)

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-ptp
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. labels:
  9. openshift.io/cluster-monitoring: "true"
  10. ---
  11. apiVersion: operators.coreos.com/v1
  12. kind: OperatorGroup
  13. metadata:
  14. name: ptp-operators
  15. namespace: openshift-ptp
  16. annotations: {}
  17. spec:
  18. targetNamespaces:
  19. - openshift-ptp

Recommended SR-IOV Operator namespace and Operator group configuration (SriovSubscriptionNS.yaml, SriovSubscriptionOperGroup.yaml)

  1. ---
  2. apiVersion: v1
  3. kind: Namespace
  4. metadata:
  5. name: openshift-sriov-network-operator
  6. annotations:
  7. workload.openshift.io/allowed: management
  8. ---
  9. apiVersion: operators.coreos.com/v1
  10. kind: OperatorGroup
  11. metadata:
  12. name: sriov-network-operators
  13. namespace: openshift-sriov-network-operator
  14. annotations: {}
  15. spec:
  16. targetNamespaces:
  17. - openshift-sriov-network-operator

Recommended CatalogSource configuration (DefaultCatsrc.yaml)

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: CatalogSource
  3. metadata:
  4. name: default-cat-source
  5. namespace: openshift-marketplace
  6. annotations:
  7. target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
  8. spec:
  9. displayName: default-cat-source
  10. image: $imageUrl
  11. publisher: Red Hat
  12. sourceType: grpc
  13. updateStrategy:
  14. registryPoll:
  15. interval: 1h
  16. status:
  17. connectionState:
  18. lastObservedState: READY

Recommended ImageContentSourcePolicy configuration (DisconnectedICSP.yaml)

  1. apiVersion: operator.openshift.io/v1alpha1
  2. kind: ImageContentSourcePolicy
  3. metadata:
  4. name: disconnected-internal-icsp
  5. annotations: {}
  6. spec:
  7. repositoryDigestMirrors:
  8. - $mirrors

Recommended OperatorHub configuration (OperatorHub.yaml)

  1. apiVersion: config.openshift.io/v1
  2. kind: OperatorHub
  3. metadata:
  4. name: cluster
  5. annotations: {}
  6. spec:
  7. disableAllDefaultSources: true

Operator subscriptions

Single-node OpenShift clusters that run DU workloads require the following Subscription CRs. The subscription provides the location to download the following Operators:

  • Local Storage Operator

  • Logging Operator

  • PTP Operator

  • SR-IOV Network Operator

  • SRIOV-FEC Operator

For each Operator subscription, specify the channel to get the Operator from. The recommended channel is stable.

You can specify Manual or Automatic updates. In Automatic mode, the Operator automatically updates to the latest versions in the channel as they become available in the registry. In Manual mode, new Operator versions are installed only when they are explicitly approved.

Use Manual mode for subscriptions. This allows you to control the timing of Operator updates to fit within scheduled maintenance windows.

Recommended Local Storage Operator subscription (StorageSubscription.yaml)

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: local-storage-operator
  5. namespace: openshift-local-storage
  6. annotations: {}
  7. spec:
  8. channel: "stable"
  9. name: local-storage-operator
  10. source: redhat-operators-disconnected
  11. sourceNamespace: openshift-marketplace
  12. installPlanApproval: Manual
  13. status:
  14. state: AtLatestKnown

Recommended SR-IOV Operator subscription (SriovSubscription.yaml)

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: sriov-network-operator-subscription
  5. namespace: openshift-sriov-network-operator
  6. annotations: {}
  7. spec:
  8. channel: "stable"
  9. name: sriov-network-operator
  10. source: redhat-operators-disconnected
  11. sourceNamespace: openshift-marketplace
  12. installPlanApproval: Manual
  13. status:
  14. state: AtLatestKnown

Recommended PTP Operator subscription (PtpSubscription.yaml)

  1. ---
  2. apiVersion: operators.coreos.com/v1alpha1
  3. kind: Subscription
  4. metadata:
  5. name: ptp-operator-subscription
  6. namespace: openshift-ptp
  7. annotations: {}
  8. spec:
  9. channel: "stable"
  10. name: ptp-operator
  11. source: redhat-operators-disconnected
  12. sourceNamespace: openshift-marketplace
  13. installPlanApproval: Manual
  14. status:
  15. state: AtLatestKnown

Recommended Cluster Logging Operator subscription (ClusterLogSubscription.yaml)

  1. apiVersion: operators.coreos.com/v1alpha1
  2. kind: Subscription
  3. metadata:
  4. name: cluster-logging
  5. namespace: openshift-logging
  6. annotations: {}
  7. spec:
  8. channel: "stable"
  9. name: cluster-logging
  10. source: redhat-operators-disconnected
  11. sourceNamespace: openshift-marketplace
  12. installPlanApproval: Manual
  13. status:
  14. state: AtLatestKnown

Cluster logging and log forwarding

Single-node OpenShift clusters that run DU workloads require logging and log forwarding for debugging. The following ClusterLogging and ClusterLogForwarder custom resources (CRs) are required.

Recommended cluster logging configuration (ClusterLogging.yaml)

  1. apiVersion: logging.openshift.io/v1
  2. kind: ClusterLogging
  3. metadata:
  4. name: instance
  5. namespace: openshift-logging
  6. annotations: {}
  7. spec:
  8. managementState: "Managed"
  9. collection:
  10. logs:
  11. type: "vector"

Recommended log forwarding configuration (ClusterLogForwarder.yaml)

  1. apiVersion: "logging.openshift.io/v1"
  2. kind: ClusterLogForwarder
  3. metadata:
  4. name: instance
  5. namespace: openshift-logging
  6. annotations: {}
  7. spec:
  8. outputs: $outputs
  9. pipelines: $pipelines

Set the spec.outputs.url field to the URL of the Kafka server where the logs are forwarded to.

Performance profile

Single-node OpenShift clusters that run DU workloads require a Node Tuning Operator performance profile to use real-time host capabilities and services.

In earlier versions of OKD, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OKD 4.11 and later, this functionality is part of the Node Tuning Operator.

The following example PerformanceProfile CR illustrates the required single-node OpenShift cluster configuration.

Recommended performance profile configuration (PerformanceProfile.yaml)

  1. apiVersion: performance.openshift.io/v2
  2. kind: PerformanceProfile
  3. metadata:
  4. # if you change this name make sure the 'include' line in TunedPerformancePatch.yaml
  5. # matches this name: include=openshift-node-performance-${PerformanceProfile.metadata.name}
  6. # Also in file 'validatorCRs/informDuValidator.yaml':
  7. # name: 50-performance-${PerformanceProfile.metadata.name}
  8. name: openshift-node-performance-profile
  9. annotations:
  10. ran.openshift.io/reference-configuration: "ran-du.redhat.com"
  11. spec:
  12. additionalKernelArgs:
  13. - "rcupdate.rcu_normal_after_boot=0"
  14. - "efi=runtime"
  15. - "vfio_pci.enable_sriov=1"
  16. - "vfio_pci.disable_idle_d3=1"
  17. - "module_blacklist=irdma"
  18. cpu:
  19. isolated: $isolated
  20. reserved: $reserved
  21. hugepages:
  22. defaultHugepagesSize: $defaultHugepagesSize
  23. pages:
  24. - size: $size
  25. count: $count
  26. node: $node
  27. machineConfigPoolSelector:
  28. pools.operator.machineconfiguration.openshift.io/$mcp: ""
  29. nodeSelector:
  30. node-role.kubernetes.io/$mcp: ""
  31. numa:
  32. topologyPolicy: "restricted"
  33. # To use the standard (non-realtime) kernel, set enabled to false
  34. realTimeKernel:
  35. enabled: true
  36. workloadHints:
  37. # WorkloadHints defines the set of upper level flags for different type of workloads.
  38. # See https://github.com/openshift/cluster-node-tuning-operator/blob/master/docs/performanceprofile/performance_profile.md#workloadhints
  39. # for detailed descriptions of each item.
  40. # The configuration below is set for a low latency, performance mode.
  41. realTime: true
  42. highPowerConsumption: false
  43. perPodPowerManagement: false
Table 3. PerformanceProfile CR options for single-node OpenShift clusters
PerformanceProfile CR fieldDescription

metadata.name

Ensure that name matches the following fields set in related GitOps ZTP custom resources (CRs):

  • include=openshift-node-performance-${PerformanceProfile.metadata.name} in TunedPerformancePatch.yaml

  • name: 50-performance-${PerformanceProfile.metadata.name} in validatorCRs/informDuValidator.yaml

spec.additionalKernelArgs

“efi=runtime” Configures UEFI secure boot for the cluster host.

spec.cpu.isolated

Set the isolated CPUs. Ensure all of the Hyper-Threading pairs match.

The reserved and isolated CPU pools must not overlap and together must span all available cores. CPU cores that are not accounted for cause an undefined behaviour in the system.

spec.cpu.reserved

Set the reserved CPUs. When workload partitioning is enabled, system processes, kernel threads, and system container threads are restricted to these CPUs. All CPUs that are not isolated should be reserved.

spec.hugepages.pages

  • Set the number of huge pages (count)

  • Set the huge pages size (size).

  • Set node to the NUMA node where the hugepages are allocated (node)

spec.realTimeKernel

Set enabled to true to use the realtime kernel.

spec.workloadHints

Use workloadHints to define the set of top level flags for different type of workloads. The example configuration configures the cluster for low latency and high performance.

Configuring cluster time synchronization

Run a one-time system time synchronization job for control plane or worker nodes.

Recommended one time time-sync for control plane nodes (99-sync-time-once-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 99-sync-time-once-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - contents: |
  14. [Unit]
  15. Description=Sync time once
  16. After=network.service
  17. [Service]
  18. Type=oneshot
  19. TimeoutStartSec=300
  20. ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
  21. RemainAfterExit=yes
  22. [Install]
  23. WantedBy=multi-user.target
  24. enabled: true
  25. name: sync-time-once.service

Recommended one time time-sync for worker nodes (99-sync-time-once-worker.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: worker
  6. name: 99-sync-time-once-worker
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. systemd:
  12. units:
  13. - contents: |
  14. [Unit]
  15. Description=Sync time once
  16. After=network.service
  17. [Service]
  18. Type=oneshot
  19. TimeoutStartSec=300
  20. ExecStart=/usr/sbin/chronyd -n -f /etc/chrony.conf -q
  21. RemainAfterExit=yes
  22. [Install]
  23. WantedBy=multi-user.target
  24. enabled: true
  25. name: sync-time-once.service

PTP

Single-node OpenShift clusters use Precision Time Protocol (PTP) for network time synchronization. The following example PtpConfig CRs illustrate the required PTP configurations for ordinary clocks, boundary clocks, and grandmaster clocks. The exact configuration you apply will depend on the node hardware and specific use case.

Recommended PTP ordinary clock configuration (PtpConfigSlave.yaml)

  1. apiVersion: ptp.openshift.io/v1
  2. kind: PtpConfig
  3. metadata:
  4. name: slave
  5. namespace: openshift-ptp
  6. annotations: {}
  7. spec:
  8. profile:
  9. - name: "slave"
  10. # The interface name is hardware-specific
  11. interface: $interface
  12. ptp4lOpts: "-2 -s"
  13. phc2sysOpts: "-a -r -n 24"
  14. ptpSchedulingPolicy: SCHED_FIFO
  15. ptpSchedulingPriority: 10
  16. ptpSettings:
  17. logReduce: "true"
  18. ptp4lConf: |
  19. [global]
  20. #
  21. # Default Data Set
  22. #
  23. twoStepFlag 1
  24. slaveOnly 1
  25. priority1 128
  26. priority2 128
  27. domainNumber 24
  28. #utc_offset 37
  29. clockClass 255
  30. clockAccuracy 0xFE
  31. offsetScaledLogVariance 0xFFFF
  32. free_running 0
  33. freq_est_interval 1
  34. dscp_event 0
  35. dscp_general 0
  36. dataset_comparison G.8275.x
  37. G.8275.defaultDS.localPriority 128
  38. #
  39. # Port Data Set
  40. #
  41. logAnnounceInterval -3
  42. logSyncInterval -4
  43. logMinDelayReqInterval -4
  44. logMinPdelayReqInterval -4
  45. announceReceiptTimeout 3
  46. syncReceiptTimeout 0
  47. delayAsymmetry 0
  48. fault_reset_interval -4
  49. neighborPropDelayThresh 20000000
  50. masterOnly 0
  51. G.8275.portDS.localPriority 128
  52. #
  53. # Run time options
  54. #
  55. assume_two_step 0
  56. logging_level 6
  57. path_trace_enabled 0
  58. follow_up_info 0
  59. hybrid_e2e 0
  60. inhibit_multicast_service 0
  61. net_sync_monitor 0
  62. tc_spanning_tree 0
  63. tx_timestamp_timeout 50
  64. unicast_listen 0
  65. unicast_master_table 0
  66. unicast_req_duration 3600
  67. use_syslog 1
  68. verbose 0
  69. summary_interval 0
  70. kernel_leap 1
  71. check_fup_sync 0
  72. clock_class_threshold 7
  73. #
  74. # Servo Options
  75. #
  76. pi_proportional_const 0.0
  77. pi_integral_const 0.0
  78. pi_proportional_scale 0.0
  79. pi_proportional_exponent -0.3
  80. pi_proportional_norm_max 0.7
  81. pi_integral_scale 0.0
  82. pi_integral_exponent 0.4
  83. pi_integral_norm_max 0.3
  84. step_threshold 2.0
  85. first_step_threshold 0.00002
  86. max_frequency 900000000
  87. clock_servo pi
  88. sanity_freq_limit 200000000
  89. ntpshm_segment 0
  90. #
  91. # Transport options
  92. #
  93. transportSpecific 0x0
  94. ptp_dst_mac 01:1B:19:00:00:00
  95. p2p_dst_mac 01:80:C2:00:00:0E
  96. udp_ttl 1
  97. udp6_scope 0x0E
  98. uds_address /var/run/ptp4l
  99. #
  100. # Default interface options
  101. #
  102. clock_type OC
  103. network_transport L2
  104. delay_mechanism E2E
  105. time_stamping hardware
  106. tsproc_mode filter
  107. delay_filter moving_median
  108. delay_filter_length 10
  109. egressLatency 0
  110. ingressLatency 0
  111. boundary_clock_jbod 0
  112. #
  113. # Clock description
  114. #
  115. productDescription ;;
  116. revisionData ;;
  117. manufacturerIdentity 00:00:00
  118. userDescription ;
  119. timeSource 0xA0
  120. recommend:
  121. - profile: "slave"
  122. priority: 4
  123. match:
  124. - nodeLabel: "node-role.kubernetes.io/$mcp"

Recommended boundary clock configuration (PtpConfigBoundary.yaml)

  1. apiVersion: ptp.openshift.io/v1
  2. kind: PtpConfig
  3. metadata:
  4. name: boundary
  5. namespace: openshift-ptp
  6. annotations: {}
  7. spec:
  8. profile:
  9. - name: "boundary"
  10. ptp4lOpts: "-2"
  11. phc2sysOpts: "-a -r -n 24"
  12. ptpSchedulingPolicy: SCHED_FIFO
  13. ptpSchedulingPriority: 10
  14. ptpSettings:
  15. logReduce: "true"
  16. ptp4lConf: |
  17. # The interface name is hardware-specific
  18. [$iface_slave]
  19. masterOnly 0
  20. [$iface_master_1]
  21. masterOnly 1
  22. [$iface_master_2]
  23. masterOnly 1
  24. [$iface_master_3]
  25. masterOnly 1
  26. [global]
  27. #
  28. # Default Data Set
  29. #
  30. twoStepFlag 1
  31. slaveOnly 0
  32. priority1 128
  33. priority2 128
  34. domainNumber 24
  35. #utc_offset 37
  36. clockClass 248
  37. clockAccuracy 0xFE
  38. offsetScaledLogVariance 0xFFFF
  39. free_running 0
  40. freq_est_interval 1
  41. dscp_event 0
  42. dscp_general 0
  43. dataset_comparison G.8275.x
  44. G.8275.defaultDS.localPriority 128
  45. #
  46. # Port Data Set
  47. #
  48. logAnnounceInterval -3
  49. logSyncInterval -4
  50. logMinDelayReqInterval -4
  51. logMinPdelayReqInterval -4
  52. announceReceiptTimeout 3
  53. syncReceiptTimeout 0
  54. delayAsymmetry 0
  55. fault_reset_interval -4
  56. neighborPropDelayThresh 20000000
  57. masterOnly 0
  58. G.8275.portDS.localPriority 128
  59. #
  60. # Run time options
  61. #
  62. assume_two_step 0
  63. logging_level 6
  64. path_trace_enabled 0
  65. follow_up_info 0
  66. hybrid_e2e 0
  67. inhibit_multicast_service 0
  68. net_sync_monitor 0
  69. tc_spanning_tree 0
  70. tx_timestamp_timeout 50
  71. unicast_listen 0
  72. unicast_master_table 0
  73. unicast_req_duration 3600
  74. use_syslog 1
  75. verbose 0
  76. summary_interval 0
  77. kernel_leap 1
  78. check_fup_sync 0
  79. clock_class_threshold 135
  80. #
  81. # Servo Options
  82. #
  83. pi_proportional_const 0.0
  84. pi_integral_const 0.0
  85. pi_proportional_scale 0.0
  86. pi_proportional_exponent -0.3
  87. pi_proportional_norm_max 0.7
  88. pi_integral_scale 0.0
  89. pi_integral_exponent 0.4
  90. pi_integral_norm_max 0.3
  91. step_threshold 2.0
  92. first_step_threshold 0.00002
  93. max_frequency 900000000
  94. clock_servo pi
  95. sanity_freq_limit 200000000
  96. ntpshm_segment 0
  97. #
  98. # Transport options
  99. #
  100. transportSpecific 0x0
  101. ptp_dst_mac 01:1B:19:00:00:00
  102. p2p_dst_mac 01:80:C2:00:00:0E
  103. udp_ttl 1
  104. udp6_scope 0x0E
  105. uds_address /var/run/ptp4l
  106. #
  107. # Default interface options
  108. #
  109. clock_type BC
  110. network_transport L2
  111. delay_mechanism E2E
  112. time_stamping hardware
  113. tsproc_mode filter
  114. delay_filter moving_median
  115. delay_filter_length 10
  116. egressLatency 0
  117. ingressLatency 0
  118. boundary_clock_jbod 0
  119. #
  120. # Clock description
  121. #
  122. productDescription ;;
  123. revisionData ;;
  124. manufacturerIdentity 00:00:00
  125. userDescription ;
  126. timeSource 0xA0
  127. recommend:
  128. - profile: "boundary"
  129. priority: 4
  130. match:
  131. - nodeLabel: "node-role.kubernetes.io/$mcp"

Recommended PTP Westport Channel e810 grandmaster clock configuration (PtpConfigGmWpc.yaml)

  1. apiVersion: ptp.openshift.io/v1
  2. kind: PtpConfig
  3. metadata:
  4. name: grandmaster
  5. namespace: openshift-ptp
  6. annotations: {}
  7. spec:
  8. profile:
  9. - name: "grandmaster"
  10. ptp4lOpts: "-2 --summary_interval -4"
  11. phc2sysOpts: -r -u 0 -m -O -37 -N 8 -R 16 -s $iface_master -n 24
  12. ptpSchedulingPolicy: SCHED_FIFO
  13. ptpSchedulingPriority: 10
  14. ptpSettings:
  15. logReduce: "true"
  16. plugins:
  17. e810:
  18. enableDefaultConfig: false
  19. settings:
  20. LocalMaxHoldoverOffSet: 1500
  21. LocalHoldoverTimeout: 14400
  22. MaxInSpecOffset: 100
  23. pins: $e810_pins
  24. # "$iface_master":
  25. # "U.FL2": "0 2"
  26. # "U.FL1": "0 1"
  27. # "SMA2": "0 2"
  28. # "SMA1": "0 1"
  29. ublxCmds:
  30. - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1
  31. - "-P"
  32. - "29.20"
  33. - "-z"
  34. - "CFG-HW-ANT_CFG_VOLTCTRL,1"
  35. reportOutput: false
  36. - args: #ubxtool -P 29.20 -e GPS
  37. - "-P"
  38. - "29.20"
  39. - "-e"
  40. - "GPS"
  41. reportOutput: false
  42. - args: #ubxtool -P 29.20 -d Galileo
  43. - "-P"
  44. - "29.20"
  45. - "-d"
  46. - "Galileo"
  47. reportOutput: false
  48. - args: #ubxtool -P 29.20 -d GLONASS
  49. - "-P"
  50. - "29.20"
  51. - "-d"
  52. - "GLONASS"
  53. reportOutput: false
  54. - args: #ubxtool -P 29.20 -d BeiDou
  55. - "-P"
  56. - "29.20"
  57. - "-d"
  58. - "BeiDou"
  59. reportOutput: false
  60. - args: #ubxtool -P 29.20 -d SBAS
  61. - "-P"
  62. - "29.20"
  63. - "-d"
  64. - "SBAS"
  65. reportOutput: false
  66. - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000
  67. - "-P"
  68. - "29.20"
  69. - "-t"
  70. - "-w"
  71. - "5"
  72. - "-v"
  73. - "1"
  74. - "-e"
  75. - "SURVEYIN,600,50000"
  76. reportOutput: true
  77. - args: #ubxtool -P 29.20 -p MON-HW
  78. - "-P"
  79. - "29.20"
  80. - "-p"
  81. - "MON-HW"
  82. reportOutput: true
  83. ts2phcOpts: " "
  84. ts2phcConf: |
  85. [nmea]
  86. ts2phc.master 1
  87. [global]
  88. use_syslog 0
  89. verbose 1
  90. logging_level 7
  91. ts2phc.pulsewidth 100000000
  92. #GNSS module s /dev/ttyGNSS* -al use _0
  93. #cat /dev/ttyGNSS_1700_0 to find available serial port
  94. #example value of gnss_serialport is /dev/ttyGNSS_1700_0
  95. ts2phc.nmea_serialport $gnss_serialport
  96. leapfile /usr/share/zoneinfo/leap-seconds.list
  97. [$iface_master]
  98. ts2phc.extts_polarity rising
  99. ts2phc.extts_correction 0
  100. ptp4lConf: |
  101. [$iface_master]
  102. masterOnly 1
  103. [$iface_master_1]
  104. masterOnly 1
  105. [$iface_master_2]
  106. masterOnly 1
  107. [$iface_master_3]
  108. masterOnly 1
  109. [global]
  110. #
  111. # Default Data Set
  112. #
  113. twoStepFlag 1
  114. priority1 128
  115. priority2 128
  116. domainNumber 24
  117. #utc_offset 37
  118. clockClass 6
  119. clockAccuracy 0x27
  120. offsetScaledLogVariance 0xFFFF
  121. free_running 0
  122. freq_est_interval 1
  123. dscp_event 0
  124. dscp_general 0
  125. dataset_comparison G.8275.x
  126. G.8275.defaultDS.localPriority 128
  127. #
  128. # Port Data Set
  129. #
  130. logAnnounceInterval -3
  131. logSyncInterval -4
  132. logMinDelayReqInterval -4
  133. logMinPdelayReqInterval 0
  134. announceReceiptTimeout 3
  135. syncReceiptTimeout 0
  136. delayAsymmetry 0
  137. fault_reset_interval -4
  138. neighborPropDelayThresh 20000000
  139. masterOnly 0
  140. G.8275.portDS.localPriority 128
  141. #
  142. # Run time options
  143. #
  144. assume_two_step 0
  145. logging_level 6
  146. path_trace_enabled 0
  147. follow_up_info 0
  148. hybrid_e2e 0
  149. inhibit_multicast_service 0
  150. net_sync_monitor 0
  151. tc_spanning_tree 0
  152. tx_timestamp_timeout 50
  153. unicast_listen 0
  154. unicast_master_table 0
  155. unicast_req_duration 3600
  156. use_syslog 1
  157. verbose 0
  158. summary_interval -4
  159. kernel_leap 1
  160. check_fup_sync 0
  161. clock_class_threshold 7
  162. #
  163. # Servo Options
  164. #
  165. pi_proportional_const 0.0
  166. pi_integral_const 0.0
  167. pi_proportional_scale 0.0
  168. pi_proportional_exponent -0.3
  169. pi_proportional_norm_max 0.7
  170. pi_integral_scale 0.0
  171. pi_integral_exponent 0.4
  172. pi_integral_norm_max 0.3
  173. step_threshold 2.0
  174. first_step_threshold 0.00002
  175. clock_servo pi
  176. sanity_freq_limit 200000000
  177. ntpshm_segment 0
  178. #
  179. # Transport options
  180. #
  181. transportSpecific 0x0
  182. ptp_dst_mac 01:1B:19:00:00:00
  183. p2p_dst_mac 01:80:C2:00:00:0E
  184. udp_ttl 1
  185. udp6_scope 0x0E
  186. uds_address /var/run/ptp4l
  187. #
  188. # Default interface options
  189. #
  190. clock_type BC
  191. network_transport L2
  192. delay_mechanism E2E
  193. time_stamping hardware
  194. tsproc_mode filter
  195. delay_filter moving_median
  196. delay_filter_length 10
  197. egressLatency 0
  198. ingressLatency 0
  199. boundary_clock_jbod 0
  200. #
  201. # Clock description
  202. #
  203. productDescription ;;
  204. revisionData ;;
  205. manufacturerIdentity 00:00:00
  206. userDescription ;
  207. timeSource 0x20
  208. recommend:
  209. - profile: "grandmaster"
  210. priority: 4
  211. match:
  212. - nodeLabel: "node-role.kubernetes.io/$mcp"

The following optional PtpOperatorConfig CR configures PTP events reporting for the node.

Recommended PTP events configuration (PtpOperatorConfigForEvent.yaml)

  1. apiVersion: ptp.openshift.io/v1
  2. kind: PtpOperatorConfig
  3. metadata:
  4. name: default
  5. namespace: openshift-ptp
  6. annotations: {}
  7. spec:
  8. daemonNodeSelector:
  9. node-role.kubernetes.io/$mcp: ""
  10. ptpEventConfig:
  11. enableEventPublisher: true
  12. transportHost: "http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"

Extended Tuned profile

Single-node OpenShift clusters that run DU workloads require additional performance tuning configurations necessary for high-performance workloads. The following example Tuned CR extends the Tuned profile:

Recommended extended Tuned profile configuration (TunedPerformancePatch.yaml)

  1. apiVersion: tuned.openshift.io/v1
  2. kind: Tuned
  3. metadata:
  4. name: performance-patch
  5. namespace: openshift-cluster-node-tuning-operator
  6. annotations: {}
  7. spec:
  8. profile:
  9. - name: performance-patch
  10. # Please note:
  11. # - The 'include' line must match the associated PerformanceProfile name, following below pattern
  12. # include=openshift-node-performance-${PerformanceProfile.metadata.name}
  13. # - When using the standard (non-realtime) kernel, remove the kernel.timer_migration override from
  14. # the [sysctl] section and remove the entire section if it is empty.
  15. data: |
  16. [main]
  17. summary=Configuration changes profile inherited from performance created tuned
  18. include=openshift-node-performance-openshift-node-performance-profile
  19. [sysctl]
  20. kernel.timer_migration=1
  21. [scheduler]
  22. group.ice-ptp=0:f:10:*:ice-ptp.*
  23. group.ice-gnss=0:f:10:*:ice-gnss.*
  24. [service]
  25. service.stalld=start,enable
  26. service.chronyd=stop,disable
  27. recommend:
  28. - machineConfigLabels:
  29. machineconfiguration.openshift.io/role: "$mcp"
  30. priority: 19
  31. profile: performance-patch
Table 4. Tuned CR options for single-node OpenShift clusters
Tuned CR fieldDescription

spec.profile.data

  • The include line that you set in spec.profile.data must match the associated PerformanceProfile CR name. For example, include=openshift-node-performance-${PerformanceProfile.metadata.name}.

  • When using the non-realtime kernel, remove the timer_migration override line from the [sysctl] section.

SR-IOV

Single root I/O virtualization (SR-IOV) is commonly used to enable fronthaul and midhaul networks. The following YAML example configures SR-IOV for a single-node OpenShift cluster.

The configuration of the SriovNetwork CR will vary depending on your specific network and infrastructure requirements.

Recommended SriovOperatorConfig CR configuration (SriovOperatorConfig.yaml)

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovOperatorConfig
  3. metadata:
  4. name: default
  5. namespace: openshift-sriov-network-operator
  6. annotations: {}
  7. spec:
  8. configDaemonNodeSelector:
  9. "node-role.kubernetes.io/$mcp": ""
  10. # Injector and OperatorWebhook pods can be disabled (set to "false") below
  11. # to reduce the number of management pods. It is recommended to start with the
  12. # webhook and injector pods enabled, and only disable them after verifying the
  13. # correctness of user manifests.
  14. # If the injector is disabled, containers using sr-iov resources must explicitly assign
  15. # them in the "requests"/"limits" section of the container spec, for example:
  16. # containers:
  17. # - name: my-sriov-workload-container
  18. # resources:
  19. # limits:
  20. # openshift.io/<resource_name>: "1"
  21. # requests:
  22. # openshift.io/<resource_name>: "1"
  23. enableInjector: true
  24. enableOperatorWebhook: true
  25. logLevel: 0
Table 5. SriovOperatorConfig CR options for single-node OpenShift clusters
SriovOperatorConfig CR fieldDescription

spec.enableInjector

Disable Injector pods to reduce the number of management pods. Start with the Injector pods enabled, and only disable them after verifying the user manifests. If the injector is disabled, containers that use SR-IOV resources must explicitly assign them in the requests and limits section of the container spec.

For example:

  1. containers:
  2. - name: my-sriov-workload-container
  3. resources:
  4. limits:
  5. openshift.io/<resource_name>: 1
  6. requests:
  7. openshift.io/<resource_name>: 1

spec.enableOperatorWebhook

Disable OperatorWebhook pods to reduce the number of management pods. Start with the OperatorWebhook pods enabled, and only disable them after verifying the user manifests.

Recommended SriovNetwork configuration (SriovNetwork.yaml)

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetwork
  3. metadata:
  4. name: ""
  5. namespace: openshift-sriov-network-operator
  6. annotations: {}
  7. spec:
  8. # resourceName: ""
  9. networkNamespace: openshift-sriov-network-operator
  10. # vlan: ""
  11. # spoofChk: ""
  12. # ipam: ""
  13. # linkState: ""
  14. # maxTxRate: ""
  15. # minTxRate: ""
  16. # vlanQoS: ""
  17. # trust: ""
  18. # capabilities: ""
Table 6. SriovNetwork CR options for single-node OpenShift clusters
SriovNetwork CR fieldDescription

spec.vlan

Configure vlan with the VLAN for the midhaul network.

Recommended SriovNetworkNodePolicy CR configuration (SriovNetworkNodePolicy.yaml)

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetworkNodePolicy
  3. metadata:
  4. name: $name
  5. namespace: openshift-sriov-network-operator
  6. annotations: {}
  7. spec:
  8. # The attributes for Mellanox/Intel based NICs as below.
  9. # deviceType: netdevice/vfio-pci
  10. # isRdma: true/false
  11. deviceType: $deviceType
  12. isRdma: $isRdma
  13. nicSelector:
  14. # The exact physical function name must match the hardware used
  15. pfNames: [$pfNames]
  16. nodeSelector:
  17. node-role.kubernetes.io/$mcp: ""
  18. numVfs: $numVfs
  19. priority: $priority
  20. resourceName: $resourceName
Table 7. SriovNetworkPolicy CR options for single-node OpenShift clusters
SriovNetworkNodePolicy CR fieldDescription

spec.deviceType

Configure deviceType as vfio-pci or netdevice. For Mellanox NICs, set deviceType: netdevice, and isRdma: true. For Intel based NICs, set deviceType: vfio-pci and isRdma: false.

spec.nicSelector.pfNames

Specifies the interface connected to the fronthaul network.

spec.numVfs

Specifies the number of VFs for the fronthaul network.

spec.nicSelector.pfNames

The exact name of physical function must match the hardware.

Recommended SR-IOV kernel configurations (07-sriov-related-kernel-args-master.yaml)

  1. apiVersion: machineconfiguration.openshift.io/v1
  2. kind: MachineConfig
  3. metadata:
  4. labels:
  5. machineconfiguration.openshift.io/role: master
  6. name: 07-sriov-related-kernel-args-master
  7. spec:
  8. config:
  9. ignition:
  10. version: 3.2.0
  11. kernelArguments:
  12. - intel_iommu=on
  13. - iommu=pt

Console Operator

Use the cluster capabilities feature to prevent the Console Operator from being installed. When the node is centrally managed it is not needed. Removing the Operator provides additional space and capacity for application workloads.

To disable the Console Operator during the installation of the managed cluster, set the following in the spec.clusters.0.installConfigOverrides field of the SiteConfig custom resource (CR):

  1. installConfigOverrides: "{\"capabilities\":{\"baselineCapabilitySet\": \"None\" }}"

Alertmanager

Single-node OpenShift clusters that run DU workloads require reduced CPU resources consumed by the OKD monitoring components. The following ConfigMap custom resource (CR) disables Alertmanager.

Recommended cluster monitoring configuration (ReduceMonitoringFootprint.yaml)

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: cluster-monitoring-config
  5. namespace: openshift-monitoring
  6. annotations: {}
  7. data:
  8. config.yaml: |
  9. grafana:
  10. enabled: false
  11. alertmanagerMain:
  12. enabled: false
  13. telemeterClient:
  14. enabled: false
  15. prometheusK8s:
  16. retention: 24h

Operator Lifecycle Manager

Single-node OpenShift clusters that run distributed unit workloads require consistent access to CPU resources. Operator Lifecycle Manager (OLM) collects performance data from Operators at regular intervals, resulting in an increase in CPU utilisation. The following ConfigMap custom resource (CR) disables the collection of Operator performance data by OLM.

Recommended cluster OLM configuration (ReduceOLMFootprint.yaml)

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: collect-profiles-config
  5. namespace: openshift-operator-lifecycle-manager
  6. data:
  7. pprof-config.yaml: |
  8. disabled: True

LVM Storage

You can dynamically provision local storage on single-node OpenShift clusters with Logical volume manager storage (LVM Storage).

The recommended storage solution for single-node OpenShift is the Local Storage Operator. Alternatively, you can use LVM Storage but it requires additional CPU resources to be allocated.

The following YAML example configures the storage of the node to be available to OKD applications.

Recommended LVMCluster configuration (StorageLVMCluster.yaml)

  1. apiVersion: lvm.topolvm.io/v1alpha1
  2. kind: LVMCluster
  3. metadata:
  4. name: odf-lvmcluster
  5. namespace: openshift-storage
  6. spec:
  7. storage:
  8. deviceClasses:
  9. - name: vg1
  10. deviceSelector:
  11. paths:
  12. - /usr/disk/by-path/pci-0000:11:00.0-nvme-1
  13. thinPoolConfig:
  14. name: thin-pool-1
  15. overprovisionRatio: 10
  16. sizePercent: 90
Table 8. LVMCluster CR options for single-node OpenShift clusters
LVMCluster CR fieldDescription

deviceSelector.paths

Configure the disks used for LVM storage. If no disks are specified, the LVM Storage uses all the unused disks in the specified thin pool.

Network diagnostics

Single-node OpenShift clusters that run DU workloads require less inter-pod network connectivity checks to reduce the additional load created by these pods. The following custom resource (CR) disables these checks.

Recommended network diagnostics configuration (DisableSnoNetworkDiag.yaml)

  1. apiVersion: operator.openshift.io/v1
  2. kind: Network
  3. metadata:
  4. name: cluster
  5. annotations: {}
  6. spec:
  7. disableNetworkDiagnostics: true

Additional resources