Special Resource Operator

Learn about the Special Resource Operator (SRO) and how you can use it to build and manage driver containers for loading kernel modules and device drivers on nodes in an OKD cluster.

The Special Resource Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

About the Special Resource Operator

The Special Resource Operator (SRO) helps you manage the deployment of kernel modules and drivers on an existing OKD cluster. The SRO can be used for a case as simple as building and loading a single kernel module, or as complex as deploying the driver, device plug-in, and monitoring stack for a hardware accelerator.

For loading kernel modules, the SRO is designed around the use of driver containers. Driver containers are increasingly being used in cloud-native environments, especially when run on pure container operating systems, to deliver hardware drivers to the host. Driver containers extend the kernel stack beyond the out-of-the-box software and hardware features of a specific kernel. Driver containers work on various container-capable Linux distributions. With driver containers, the host operating system stays clean and there is no clash between different library versions or binaries on the host.

Installing the Special Resource Operator

As a cluster administrator, you can install the Special Resource Operator (SRO) by using the OpenShift CLI or the web console.

Installing the Special Resource Operator by using the CLI

As a cluster administrator, you can install the Special Resource Operator (SRO) by using the OpenShift CLI.

Prerequisites

  • You have a running OKD cluster.

  • You installed the OpenShift CLI (oc).

  • You are logged into the OpenShift CLI as a user with cluster-admin privileges.

  • You installed the Node Feature Discovery (NFD) Operator.

Procedure

  1. Create a namespace for the Special Resource Operator:

    1. Create the following Namespace custom resource (CR) that defines the openshift-special-resource-operator namespace, and then save the YAML in the sro-namespace.yaml file:

      1. apiVersion: v1
      2. kind: Namespace
      3. metadata:
      4. name: openshift-special-resource-operator
    2. Create the namespace by running the following command:

      1. $ oc create -f sro-namespace.yaml
  2. Install the SRO in the namespace you created in the previous step:

    1. Create the following OperatorGroup CR and save the YAML in the sro-operatorgroup.yaml file:

      1. apiVersion: operators.coreos.com/v1
      2. kind: OperatorGroup
      3. metadata:
      4. generateName: openshift-special-resource-operator-
      5. name: openshift-special-resource-operator
      6. namespace: openshift-special-resource-operator
      7. spec:
      8. targetNamespaces:
      9. - openshift-special-resource-operator
    2. Create the operator group by running the following command:

      1. $ oc create -f sro-operatorgroup.yaml
    3. Run the following oc get command to get the channel value required for the next step:

      1. $ oc get packagemanifest openshift-special-resource-operator -n openshift-marketplace -o jsonpath='{.status.defaultChannel}'

      Example output

      1. 4.9
    4. Create the following Subscription CR and save the YAML in the sro-sub.yaml file:

      Example Subscription CR

      1. apiVersion: operators.coreos.com/v1alpha1
      2. kind: Subscription
      3. metadata:
      4. name: openshift-special-resource-operator
      5. namespace: openshift-special-resource-operator
      6. spec:
      7. channel: "4.9" (1)
      8. installPlanApproval: Automatic
      9. name: openshift-special-resource-operator
      10. source: redhat-operators
      11. sourceNamespace: openshift-marketplace
      1Replace the channel value with the output from the previous command.
    5. Create the subscription object by running the following command:

      1. $ oc create -f sro-sub.yaml
    6. Switch to the openshift-special-resource-operator project:

      1. $ oc project openshift-special-resource-operator

Verification

  • To verify that the Operator deployment is successful, run:

    1. $ oc get pods

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. special-resource-controller-manager-7bfb544d45-xx62r 2/2 Running 0 2m28s

    A successful deployment shows a Running status.

Installing the Special Resource Operator by using the web console

As a cluster administrator, you can install the Special Resource Operator (SRO) by using the OKD web console.

Prerequisites

  • You installed the Node Feature Discovery (NFD) Operator.

Procedure

  1. Log in to the OKD web console.

  2. Create the required namespace for the Special Resource Operator:

    1. Navigate to AdministrationNamespaces and click Create Namespace.

    2. Enter openshift-special-resource-operator in the Name field and click Create.

  3. Install the Special Resource Operator:

    1. In the OKD web console, click OperatorsOperatorHub.

    2. Choose Special Resource Operator from the list of available Operators, and then click Install.

    3. On the Install Operator page, select a specific namespace on the cluster, select the namespace created in the previous section, and then click Install.

Verification

To verify that the Special Resource Operator installed successfully:

  1. Navigate to the OperatorsInstalled Operators page.

  2. Ensure that Special Resource Operator is listed in the openshift-special-resource-operator project with a Status of InstallSucceeded.

    During installation, an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

  3. If the Operator does not appear as installed, to troubleshoot further:

    1. Navigate to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.

    2. Navigate to the WorkloadsPods page and check the logs for pods in the openshift-special-resource-operator project.

    The Node Feature Discovery (NFD) Operator is a dependency of the Special Resource Operator (SRO). If the NFD Operator is not installed before installing the SRO, the Operator Lifecycle Manager will automatically install the NFD Operator. However, the required Node Feature Discovery operand will not be deployed automatically. The Node Feature Discovery Operator documentation provides details about how to deploy NFD by using the NFD Operator.

Using the Special Resource Operator

The Special Resource Operator (SRO) is used to manage the build and deployment of a driver container. The objects required to build and deploy the container can be defined in a Helm chart.

The examples in this section use the simple-kmod kernel module to demonstrate how to use the SRO to build and run a driver container. In the first example, the SRO image contains a local repository of Helm charts including the templates for deploying the simple-kmod kernel module. In this case, a SpecialResource manifest is used to deploy the driver container. In the second example, the simple-kmod SpecialResource object points to a ConfigMap object that is created to store the Helm charts.

Building and running the simple-kmod SpecialResource by using the templates from the SRO image

The SRO image contains a local repository of Helm charts including the templates for deploying the simple-kmod kernel module. In this example, the simple-kmod kernel module is used to show how the SRO can manage a driver container that is defined in the internal SRO repository.

Prerequisites

  • You have a running OKD cluster.

  • You set the Image Registry Operator state to Managed for your cluster.

  • You installed the OpenShift CLI (oc).

  • You are logged into the OpenShift CLI as a user with cluster-admin privileges.

  • You installed the Node Feature Discovery (NFD) Operator.

  • You installed the Special Resource Operator.

Procedure

  1. To deploy the simple-kmod using the SRO image’s local Helm repository, use the following SpecialResource manifest. Save this YAML as simple-kmod-local.yaml.

    1. apiVersion: sro.openshift.io/v1beta1
    2. kind: SpecialResource
    3. metadata:
    4. name: simple-kmod
    5. spec:
    6. namespace: simple-kmod
    7. chart:
    8. name: simple-kmod
    9. version: 0.0.1
    10. repository:
    11. name: example
    12. url: file:///charts/example
    13. set:
    14. kind: Values
    15. apiVersion: sro.openshift.io/v1beta1
    16. kmodNames: ["simple-kmod", "simple-procfs-kmod"]
    17. buildArgs:
    18. - name: "KMODVER"
    19. value: "SRO"
    20. driverContainer:
    21. source:
    22. git:
    23. ref: "master"
    24. uri: "https://github.com/openshift-psap/kvc-simple-kmod.git"
  2. Create the SpecialResource:

    1. $ oc create -f simple-kmod-local.yaml

    The simple-kmod resources are deployed in the simple-kmod namespace as specified in the object manifest. After a short time, the build pod for the simple-kmod driver container starts running. The build completes after a few minutes, and then the driver container pods start running.

  3. Use the oc get pods command to display the status of the pods:

    1. $ oc get pods -n simple-kmod

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. simple-kmod-driver-build-12813789169ac0ee-1-build 0/1 Completed 0 7m12s
    3. simple-kmod-driver-container-12813789169ac0ee-mjsnh 1/1 Running 0 8m2s
    4. simple-kmod-driver-container-12813789169ac0ee-qtkff 1/1 Running 0 8m2s
  4. To display the logs of the simple-kmod driver container image build, use the oc logs command, along with the build pod name obtained above:

    1. $ oc logs pod/simple-kmod-driver-build-12813789169ac0ee-1-build -n simple-kmod
  5. To verify that the simple-kmod kernel modules are loaded, execute the lsmod command in one of the driver container pods that was returned from the oc get pods command above:

    1. $ oc exec -n simple-kmod -it pod/simple-kmod-driver-container-12813789169ac0ee-mjsnh -- lsmod | grep simple

    Example output

    1. simple_procfs_kmod 16384 0
    2. simple_kmod 16384 0

If you want to remove the simple-kmod kernel module from the node, delete the simple-kmod SpecialResource API object using the oc delete command. The kernel module is unloaded when the driver container pod is deleted.

Building and running the simple-kmod SpecialResource by using a config map

In this example, the simple-kmod kernel module is used to show how the SRO can manage a driver container which is defined in Helm chart templates stored in a config map.

Prerequisites

  • You have a running OKD cluster.

  • You set the Image Registry Operator state to Managed for your cluster.

  • You installed the OpenShift CLI (oc).

  • You are logged into the OpenShift CLI as a user with cluster-admin privileges.

  • You installed the Node Feature Discovery (NFD) Operator.

  • You installed the Special Resource Operator.

  • You installed the Helm CLI (helm).

Procedure

  1. To create a simple-kmod SpecialResource object, define an image stream and build config to build the image, and a service account, role, role binding, and daemon set to run the container. The service account, role, and role binding are required to run the daemon set with the privileged security context so that the kernel module can be loaded.

    1. Create a templates directory, and change into it:

      1. $ mkdir -p chart/simple-kmod-0.0.1/templates
      1. $ cd chart/simple-kmod-0.0.1/templates
    2. Save this YAML template for the image stream and build config in the templates directory as 0000-buildconfig.yaml:

      1. apiVersion: image.openshift.io/v1
      2. kind: ImageStream
      3. metadata:
      4. labels:
      5. app: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}} (1)
      6. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}} (1)
      7. spec: {}
      8. ---
      9. apiVersion: build.openshift.io/v1
      10. kind: BuildConfig
      11. metadata:
      12. labels:
      13. app: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverBuild}} (1)
      14. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverBuild}} (1)
      15. annotations:
      16. specialresource.openshift.io/wait: "true"
      17. specialresource.openshift.io/driver-container-vendor: simple-kmod
      18. specialresource.openshift.io/kernel-affine: "true"
      19. spec:
      20. nodeSelector:
      21. node-role.kubernetes.io/worker: ""
      22. runPolicy: "Serial"
      23. triggers:
      24. - type: "ConfigChange"
      25. - type: "ImageChange"
      26. source:
      27. git:
      28. ref: {{.Values.specialresource.spec.driverContainer.source.git.ref}}
      29. uri: {{.Values.specialresource.spec.driverContainer.source.git.uri}}
      30. type: Git
      31. strategy:
      32. dockerStrategy:
      33. dockerfilePath: Dockerfile.SRO
      34. buildArgs:
      35. - name: "IMAGE"
      36. value: {{ .Values.driverToolkitImage }}
      37. {{- range $arg := .Values.buildArgs }}
      38. - name: {{ $arg.name }}
      39. value: {{ $arg.value }}
      40. {{- end }}
      41. - name: KVER
      42. value: {{ .Values.kernelFullVersion }}
      43. output:
      44. to:
      45. kind: ImageStreamTag
      46. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}:v{{.Values.kernelFullVersion}} (1)
      1The templates such as {{.Values.specialresource.metadata.name}} are filled in by the SRO, based on fields in the SpecialResource CR and variables known to the Operator such as {{.Values.KernelFullVersion}}.
    3. Save the following YAML template for the RBAC resources and daemon set in the templates directory as 1000-driver-container.yaml:

      1. apiVersion: v1
      2. kind: ServiceAccount
      3. metadata:
      4. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      5. ---
      6. apiVersion: rbac.authorization.k8s.io/v1
      7. kind: Role
      8. metadata:
      9. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      10. rules:
      11. - apiGroups:
      12. - security.openshift.io
      13. resources:
      14. - securitycontextconstraints
      15. verbs:
      16. - use
      17. resourceNames:
      18. - privileged
      19. ---
      20. apiVersion: rbac.authorization.k8s.io/v1
      21. kind: RoleBinding
      22. metadata:
      23. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      24. roleRef:
      25. apiGroup: rbac.authorization.k8s.io
      26. kind: Role
      27. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      28. subjects:
      29. - kind: ServiceAccount
      30. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      31. namespace: {{.Values.specialresource.spec.namespace}}
      32. ---
      33. apiVersion: apps/v1
      34. kind: DaemonSet
      35. metadata:
      36. labels:
      37. app: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      38. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      39. annotations:
      40. specialresource.openshift.io/wait: "true"
      41. specialresource.openshift.io/state: "driver-container"
      42. specialresource.openshift.io/driver-container-vendor: simple-kmod
      43. specialresource.openshift.io/kernel-affine: "true"
      44. specialresource.openshift.io/from-configmap: "true"
      45. spec:
      46. updateStrategy:
      47. type: OnDelete
      48. selector:
      49. matchLabels:
      50. app: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      51. template:
      52. metadata:
      53. # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
      54. # reserves resources for critical add-on pods so that they can be rescheduled after
      55. # a failure. This annotation works in tandem with the toleration below.
      56. annotations:
      57. scheduler.alpha.kubernetes.io/critical-pod: ""
      58. labels:
      59. app: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      60. spec:
      61. serviceAccount: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      62. serviceAccountName: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      63. containers:
      64. - image: image-registry.openshift-image-registry.svc:5000/{{.Values.specialresource.spec.namespace}}/{{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}:v{{.Values.kernelFullVersion}}
      65. name: {{.Values.specialresource.metadata.name}}-{{.Values.groupName.driverContainer}}
      66. imagePullPolicy: Always
      67. command: ["/sbin/init"]
      68. lifecycle:
      69. preStop:
      70. exec:
      71. command: ["/bin/sh", "-c", "systemctl stop kmods-via-containers@{{.Values.specialresource.metadata.name}}"]
      72. securityContext:
      73. privileged: true
      74. nodeSelector:
      75. node-role.kubernetes.io/worker: ""
      76. feature.node.kubernetes.io/kernel-version.full: "{{.Values.KernelFullVersion}}"
    4. Change into the chart/simple-kmod-0.0.1 directory:

      1. $ cd ..
    5. Save the following YAML for the chart as Chart.yaml in the chart/simple-kmod-0.0.1 directory:

      1. apiVersion: v2
      2. name: simple-kmod
      3. description: Simple kmod will deploy a simple kmod driver-container
      4. icon: https://avatars.githubusercontent.com/u/55542927
      5. type: application
      6. version: 0.0.1
      7. appVersion: 1.0.0
  2. From the chart directory, create the chart using the helm package command:

    1. $ helm package simple-kmod-0.0.1/

    Example output

    1. Successfully packaged chart and saved it to: /data/<username>/git/<github_username>/special-resource-operator/yaml-for-docs/chart/simple-kmod-0.0.1/simple-kmod-0.0.1.tgz
  3. Create a config map to store the chart files:

    1. Create a directory for the config map files:

      1. $ mkdir cm
    2. Copy the Helm chart into the cm directory:

      1. $ cp simple-kmod-0.0.1.tgz cm/simple-kmod-0.0.1.tgz
    3. Create an index file specifying the Helm repo that contains the Helm chart:

      1. $ helm repo index cm --url=cm://simple-kmod/simple-kmod-chart
    4. Create a namespace for the objects defined in the Helm chart:

      1. $ oc create namespace simple-kmod
    5. Create the config map object:

      1. $ oc create cm simple-kmod-chart --from-file=cm/index.yaml --from-file=cm/simple-kmod-0.0.1.tgz -n simple-kmod
  4. Use the following SpecialResource manifest to deploy the simple-kmod object using the Helm chart that you created in the config map. Save this YAML as simple-kmod-configmap.yaml:

    1. apiVersion: sro.openshift.io/v1beta1
    2. kind: SpecialResource
    3. metadata:
    4. name: simple-kmod
    5. spec:
    6. #debug: true (1)
    7. namespace: simple-kmod
    8. chart:
    9. name: simple-kmod
    10. version: 0.0.1
    11. repository:
    12. name: example
    13. url: cm://simple-kmod/simple-kmod-chart (2)
    14. set:
    15. kind: Values
    16. apiVersion: sro.openshift.io/v1beta1
    17. kmodNames: ["simple-kmod", "simple-procfs-kmod"]
    18. buildArgs:
    19. - name: "KMODVER"
    20. value: "SRO"
    21. driverContainer:
    22. source:
    23. git:
    24. ref: "master"
    25. uri: "https://github.com/openshift-psap/kvc-simple-kmod.git"
    1Optional: Uncomment the #debug: true line to have the YAML files in the chart printed in full in the Operator logs and to verify that the logs are created and templated properly.
    2The spec.chart.repository.url field tells the SRO to look for the chart in a config map.
  5. From a command line, create the SpecialResource file:

    1. $ oc create -f simple-kmod-configmap.yaml

    The simple-kmod resources are deployed in the simple-kmod namespace as specified in the object manifest. After a short time, the build pod for the simple-kmod driver container starts running. The build completes after a few minutes, and then the driver container pods start running.

  6. Use oc get pods command to display the status of the build pods:

    1. $ oc get pods -n simple-kmod

    Example output

    1. NAME READY STATUS RESTARTS AGE
    2. simple-kmod-driver-build-12813789169ac0ee-1-build 0/1 Completed 0 7m12s
    3. simple-kmod-driver-container-12813789169ac0ee-mjsnh 1/1 Running 0 8m2s
    4. simple-kmod-driver-container-12813789169ac0ee-qtkff 1/1 Running 0 8m2s
  7. Use the oc logs command, along with the build pod name obtained from the oc get pods command above, to display the logs of the simple-kmod driver container image build:

    1. $ oc logs pod/simple-kmod-driver-build-12813789169ac0ee-1-build -n simple-kmod
  8. To verify that the simple-kmod kernel modules are loaded, execute the lsmod command in one of the driver container pods that was returned from the oc get pods command above:

    1. $ oc exec -n simple-kmod -it pod/simple-kmod-driver-container-12813789169ac0ee-mjsnh -- lsmod | grep simple

    Example output

    1. simple_procfs_kmod 16384 0
    2. simple_kmod 16384 0

If you want to remove the simple-kmod kernel module from the node, delete the simple-kmod SpecialResource API object using the oc delete command. The kernel module is unloaded when the driver container pod is deleted.

Additional resources