Deploying distributed units at scale in a disconnected environment

Use zero touch provisioning (ZTP) to provision distributed units at new edge sites in a disconnected environment. The workflow starts when the site is connected to the network and ends with the CNF workload deployed and running on the site nodes.

ZTP for RAN deployments is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

Provisioning edge sites at scale

Telco edge computing presents extraordinary challenges with managing hundreds to tens of thousands of clusters in hundreds of thousands of locations. These challenges require fully-automated management solutions with, as closely as possible, zero human interaction.

Zero touch provisioning (ZTP) allows you to provision new edge sites with declarative configurations of bare-metal equipment at remote sites. Template or overlay configurations install OKD features that are required for CNF workloads. End-to-end functional test suites are used to verify CNF related features. All configurations are declarative in nature.

You start the workflow by creating declarative configurations for ISO images that are delivered to the edge nodes to begin the installation process. The images are used to repeatedly provision large numbers of nodes efficiently and quickly, allowing you keep up with requirements from the field for far edge nodes.

Service providers are deploying a more distributed mobile network architecture allowed by the modular functional framework defined for 5G. This allows service providers to move from appliance-based radio access networks (RAN) to open cloud RAN architecture, gaining flexibility and agility in delivering services to end users.

The following diagram shows how ZTP works within a far edge framework.

ZTP in a far edge framework

The GitOps approach

ZTP uses the GitOps deployment set of practices for infrastructure deployment that allows developers to perform tasks that would otherwise fall under the purview of IT operations. GitOps achieves these tasks using declarative specifications stored in Git repositories, such as YAML files and other defined patterns, that provide a framework for deploying the infrastructure. The declarative output is leveraged by the Open Cluster Manager (OCM) for multisite deployment.

One of the motivators for a GitOps approach is the requirement for reliability at scale. This is a significant challenge that GitOps helps solve.

GitOps addresses the reliability issue by providing traceability, RBAC, and a single source of truth for the desired state of each site. Scale issues are addressed by GitOps providing structure, tooling, and event driven operations through webhooks.

About ZTP and distributed units on single nodes

You can install a distributed unit (DU) on a single node at scale with Red Hat Advanced Cluster Management (RHACM) (ACM) using the assisted installer (AI) and the policy generator with core-reduction technology enabled. The DU installation is done using zero touch provisioning (ZTP) in a disconnected environment.

ACM manages clusters in a hub and spoke architecture, where a single hub cluster manages many spoke clusters. ACM applies radio access network (RAN) policies from predefined custom resources (CRs). Hub clusters running ACM provision and deploy the spoke clusters using ZTP and AI. DU installation follows the AI installation of OKD on a single node.

The AI service handles provisioning of OKD on single nodes running on bare metal. ACM ships with and deploys the assisted installer when the MultiClusterHub custom resource is installed.

With ZTP and AI, you can provision OKD single nodes to run your DUs at scale. A high level overview of ZTP for distributed units in a disconnected environment is as follows:

  • A hub cluster running ACM manages a disconnected internal registry that mirrors the OKD release images. The internal registry is used to provision the spoke single nodes.

  • You manage the bare metal host machines for your DUs in an inventory file that uses YAML for formatting. You store the inventory file in a Git repository.

  • You install the DU bare metal host machines on site, and make the hosts ready for provisioning. To be ready for provisioning, the following is required for each bare metal host:

    • Network connectivity - including DNS for your network. Hosts should be reachable through the hub and managed spoke clusters. Ensure there is layer 3 connectivity between the hub and the host where you want to install your hub cluster.

    • Baseboard Management Controller (BMC) details for each host - ZTP uses BMC details to connect the URL and credentials for accessing the BMC. Create spoke cluster definition CRs. These define the relevant elements for the managed clusters. Required CRs are as follows:

      Custom ResourceDescription

      Namespace

      Namespace for the managed single node cluster.

      BMCSecret CR

      Credentials for the host BMC.

      Image Pull Secret CR

      Pull secret for the disconnected registry.

      AgentClusterInstall

      Specifies the single node cluster’s configuration such as networking, number of supervisor (control plane) nodes, and so on.

      ClusterDeployment

      Defines the cluster name, domain, and other details.

      KlusterletAddonConfig

      Manages installation and termination of add-ons on the ManagedCluster for ACM.

      ManagedCluster

      Describes the managed cluster for ACM.

      InfraEnv

      Describes the installation ISO to be mounted on the destination node that the assisted installer service creates. This is the final step of the manifest creation phase.

      BareMetalHost

      Describes the details of the bare metal host, including BMC and credentials details.

  • When a change is detected in the host inventory repository, a host management event is triggered to provision the new or updated host.

  • The host is provisioned. When the host is provisioned and successfully rebooted, the host agent reports Ready status to the hub cluster.

Zero touch provisioning building blocks

ACM deploys single node OpenShift (SNO), which is OKD installed on single nodes, leveraging zero touch provisioning (ZTP). The initial site plan is broken down into smaller components and initial configuration data is stored in a Git repository. Zero touch provisioning uses a declarative GitOps approach to deploy these nodes. The deployment of the nodes includes:

  • Installing the host operating system (RHCOS) on a blank server.

  • Deploying OKD on single nodes.

  • Creating cluster policies and site subscriptions.

  • Leveraging a GitOps deployment topology for a develop once, deploy anywhere model.

  • Making the necessary network configurations to the server operating system.

  • Deploying profile Operators and performing any needed software-related configuration, such as performance profile, PTP, and SR-IOV.

  • Downloading images needed to run workloads (CNFs).

Single node clusters

You use zero touch provisioning (ZTP) to deploy single node clusters to run distributed units (DUs) on small hardware footprints at disconnected far edge sites. A single node cluster runs OKD on top of one bare metal machine, hence the single node. Edge servers contain a single node with supervisor functions and worker functions on the same host that are deployed at low bandwidth or disconnected edge sites.

OKD is configured on the single node to use workload partitioning. Workload partitioning separates cluster management workloads from user workloads and can run the cluster management workloads on a reserved set of CPUs. Workload partitioning is useful for resource-constrained environments, such as single-node production deployments, where you want to reserve most of the CPU resources for user workloads and configure OKD to use fewer CPU resources within the host.

A single node cluster hosting a DU application on a node is divided into the following configuration categories:

  • Common - Values are the same for all single node cluster sites managed by a hub cluster.

  • Pools of sites - Common across a pool of sites where a pool size can be 1 to n.

  • Site specific - Likely specific to a site with no overlap with other sites, for example, a vlan.

Site planning considerations for distributed unit deployments

Site planning for distributed units (DU) deployments is complex. The following is an overview of the tasks that you complete before the DU hosts are brought online in the production environment.

  • Develop a network model. The network model depends on various factors such as the size of the area of coverage, number of hosts, projected traffic load, DNS, and DHCP requirements.

  • Decide how many DU radio nodes are required to provide sufficient coverage and redundancy for your network.

  • Develop mechanical and electrical specifications for the DU host hardware.

  • Develop a construction plan for individual DU site installations.

  • Tune host BIOS settings for production, and deploy the BIOS configuration to the hosts.

  • Install the equipment on-site, connect hosts to the network, and apply power.

  • Configure on-site switches and routers.

  • Perform basic connectivity tests for the host machines.

  • Establish production network connectivity, and verify host connections to the network.

  • Provision and deploy on-site DU hosts at scale.

  • Test and verify on-site operations, performing load and scale testing of the DU hosts before finally bringing the DU infrastructure online in the live production environment.

Low latency for distributed units (DUs)

Low latency is an integral part of the development of 5G networks. Telecommunications networks require as little signal delay as possible to ensure quality of service in a variety of critical use cases.

Low latency processing is essential for any communication with timing constraints that affect functionality and security. For example, 5G Telco applications require a guaranteed one millisecond one-way latency to meet Internet of Things (IoT) requirements. Low latency is also critical for the future development of autonomous vehicles, smart factories, and online gaming. Networks in these environments require almost a real-time flow of data.

Low latency systems are about guarantees with regards to response and processing times. This includes keeping a communication protocol running smoothly, ensuring device security with fast responses to error conditions, or just making sure a system is not lagging behind when receiving a lot of data. Low latency is key for optimal synchronization of radio transmissions.

OKD enables low latency processing for DUs running on COTS hardware by using a number of technologies and specialized hardware devices:

Real-time kernel for RHCOS

Ensures workloads are handled with a high degree of process determinism.

CPU isolation

Avoids CPU scheduling delays and ensures CPU capacity is available consistently.

NUMA awareness

Aligns memory and huge pages with CPU and PCI devices to pin guaranteed container memory and huge pages to the NUMA node. This decreases latency and improves performance of the node.

Huge pages memory management

Using huge page sizes improves system performance by reducing the amount of system resources required to access page tables.

Precision timing synchronization using PTP

Allows synchronization between nodes in the network with sub-microsecond accuracy.

Configuring BIOS for distributed unit bare-metal hosts

Distributed unit (DU) hosts require the BIOS to be configured before the host can be provisioned. The BIOS configuration is dependent on the specific hardware that runs your DUs and the particular requirements of your installation.

In this Developer Preview release, configuration and tuning of BIOS for DU bare-metal host machines is the responsibility of the customer. Automatic setting of BIOS is not handled by the zero touch provisioning workflow.

Procedure

  1. Set the UEFI/BIOS Boot Mode to UEFI.

  2. In the host boot sequence order, set Hard drive first.

  3. Apply the specific BIOS configuration for your hardware. The following table describes a representative BIOS configuration for an Intel Xeon Skylake or Intel Cascade Lake server, based on the Intel FlexRAN 4G and 5G baseband PHY reference design.

    The exact BIOS configuration depends on your specific hardware and network requirements. The following sample configuration is for illustrative purposes only.

    Table 1. Sample BIOS configuration for an Intel Xeon Skylake or Cascade Lake server
    BIOS SettingConfiguration

    CPU Power and Performance Policy

    Performance

    Uncore Frequency Scaling

    Disabled

    Performance P-limit

    Disabled

    Enhanced Intel SpeedStep ® Tech

    Enabled

    Intel Configurable TDP

    Enabled

    Configurable TDP Level

    Level 2

    Intel® Turbo Boost Technology

    Enabled

    Energy Efficient Turbo

    Disabled

    Hardware P-States

    Disabled

    Package C-State

    C0/C1 state

    C1E

    Disabled

    Processor C6

    Disabled

Enable global SR-IOV and VT-d settings in the BIOS for the host. These settings are relevant to bare-metal environments.

Distributed unit host requirements

The following tables provide a high level overview of the networking information and custom resources required by Red Hat Advanced Cluster Management (RHACM) to provision a DU bare-metal host:

Table 2. Required AgentClusterInstall fields
FieldDescription

imageSetRef

Installer image used to install OKD on the DU.

clusterNetwork

Used to allocate an IPv4 or IPv6 IP address to each node. Ensure there is no overlap with serviceNetwork.

serviceNetwork

Block of IPv4 or IPv6 IP addresses used for cluster services internal communication in OKD. Ensure there is no overlap with clusterNetwork.

machineNetwork

Represents the network range for external communication. Also used to determine the API and Ingress VIP addresses for provisioning the cluster.

Do not specify API and Ingress VIP addresses for DU single node clusters. Instead, when the host is provisioned by the assisted installer service, the machineNetwork field in the AgentClusterInstall CR is used to determine the API and Ingress VIP addresses.

Table 3. Required ClusterDeployment fields
FieldDescription

baseDomain

Base domain for the hub cluster managing the individual DU single node clusters.

sshPrivateKeySecretRef

SSH private key for secure transactions with the single node cluster DU.

pullSecretRef

Pull secret for secure installation on the DU host.

Table 4. Required BareMetalHost fields
FieldDescription

bmc

BMC address and BMC username and password credentials.

bootMACAddress

Boot MAC address for the bare-metal host.

bmac.agent-install.openshift.io/hostname

Optional: Configures the cluster hostname. If this field is not used, a hostname is allocated by the cluster DHCP server.

spec.bmc.address

Location of the installation ISO.

spec.bmc.credentialsName

Name of the bmcCredentials secret used to access the ISO image.

userData.bootkey

Reference to the Secret containing the user data to be passed to the host before it boots from the ISO image.

Table 5. Required InfraEnv fields
FieldDescription

additionalNTPSources

IP address for a Network Time Protocol (NTP) server. NTP is required to ensure that the certificates are installed correctly on the DU host. The NTP server is only required during provisioning.

pullSecretRef

Name of the pull secret created for the DU host.

Table 6. Required NMStateConfig fields
FieldDescription

dns-resolver

Target cluster DNS server.

interfaces

Configures eno1 for IPv4 and IPv6 connectivity.

Routes

Configures the default route for the target cluster.

mac-address

Target bare-metal host MAC address. Must match the MAC address specified in the BareMetalHost custom resource (CR).

ip-address

Target bare-metal host static IP address.

public-network-prefix

Bare-metal host static IP address subnet.

gateway

Target bare-metal host gateway.

Interfaces

Target bare-metal host interface name and MAC address.

NMStateConfig is an optional resource. Use NMStateConfig to configure network bonding for a pair of NICs, use a concrete VLAN, or to declare a static IP for the DU host. Each NMState profile has a one-to-one relationship with a related InfraEnv ISO profile used for installing OKD on the host. If used, the NMStateConfig resource must be created before the ClusterDeployment resource. The NMStateConfig resource is not required if DHCP is enabled for the cluster network.

Preparing the disconnected environment

Before you can provision distributed units (DU) at scale, you must install Red Hat Advanced Cluster Management (RHACM), which handles the provisioning of the DUs.

RHACM is deployed as an Operator on the OKD hub cluster. It controls clusters and applications from a single console with built-in security policies. RHACM provisions and manage your DU hosts. To install RHACM in a disconnected environment, you create a mirror registry that mirrors the Operator Lifecycle Manager (OLM) catalog that contains the required Operator images. OLM manages, installs, and upgrades Operators and their dependencies in the cluster.

You also use a disconnected mirror host to serve the FCOS ISO and RootFS disk images that provision the DU bare-metal host operating system.

Before you install a cluster on infrastructure that you provision in a restricted network, you must mirror the required container images into that environment. You can also use this procedure in unrestricted networks to ensure your clusters only use container images that have satisfied your organizational controls on external content.

You must have access to the internet to obtain the necessary container images. In this procedure, you place the mirror registry on a mirror host that has access to both your network and the internet. If you do not have access to a mirror host, use the disconnected procedure to copy images to a device that you can move across network boundaries.

Disconnected environment prerequisites

You must have a container image registry that supports Docker v2-2 in the location that will host the OKD cluster, such as one of the following registries:

If you have an entitlement to Red Hat Quay, see the documentation on deploying Red Hat Quay for proof-of-concept purposes or by using the Quay Operator. If you need additional assistance selecting and installing a registry, contact your sales representative or Red Hat support.

About the mirror registry

You can mirror the images that are required for OKD installation and subsequent product updates to a mirror registry. These actions use the same process. The release image, which contains the description of the content, and the images it references are all mirrored. In addition, the Operator catalog source image and the images that it references must be mirrored for each Operator that you use. After you mirror the content, you configure each cluster to retrieve this content from your mirror registry.

The mirror registry can be any container registry that supports Docker v2-2. All major cloud provider registries, as well as Red Hat Quay, Artifactory, and others, have the necessary support. Using one of these registries ensures that OKD can verify the integrity of each image in disconnected environments.

The internal registry of the OKD cluster cannot be used as the target registry because it does not support pushing without a tag, which is required during the mirroring process.

The mirror registry must be reachable by every machine in the clusters that you provision. If the registry is unreachable installation, updating, or normal operations such as workload relocation might fail. For that reason, you must run mirror registries in a highly available way, and the mirror registries must at least match the production availability of your OKD clusters.

When you populate a mirror registry with OKD images, you can follow two scenarios. If you have a host that can access both the internet and your mirror registry, but not your cluster nodes, you can directly mirror the content from that machine. This process is referred to as connected mirroring. If you have no such host, you must mirror the images to a file system and then bring that host or removable media into your restricted environment. This process is referred to as disconnected mirroring.

For mirrored registries, to view the source of pulled images, you must review the Trying to access log entry in the CRI-O logs. Other methods to view the image pull source, such as using the crictl images command on a node, show the non-mirrored image name, even though the image is pulled from the mirrored location.

Additional resources

For information on viewing the CRI-O logs to view the image source, see Viewing the image pull source.

Preparing your mirror host

Before you perform the mirror procedure, you must prepare the host to retrieve content and push it to the remote location.

Installing the OpenShift CLI by downloading the binary

You can install the OpenShift CLI (oc) to interact with OKD from a command-line interface. You can install oc on Linux, Windows, or macOS.

If you installed an earlier version of oc, you cannot use it to complete all of the commands in OKD 4.9. Download and install the new version of oc.

Installing the OpenShift CLI on Linux

You can install the OpenShift CLI (oc) binary on Linux by using the following procedure.

Procedure

  1. Navigate to https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/ and choose the folder for your operating system and architecture.

  2. Download oc.tar.gz.

  3. Unpack the archive:

    1. $ tar xvzf <file>
  4. Place the oc binary in a directory that is on your PATH.

    To check your PATH, execute the following command:

    1. $ echo $PATH

After you install the OpenShift CLI, it is available using the oc command:

  1. $ oc <command>
Installing the OpenShift CLI on Windows

You can install the OpenShift CLI (oc) binary on Windows by using the following procedure.

Procedure

  1. Navigate to https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/ and choose the folder for your operating system and architecture.

  2. Download oc.zip.

  3. Unzip the archive with a ZIP program.

  4. Move the oc binary to a directory that is on your PATH.

    To check your PATH, open the command prompt and execute the following command:

    1. C:\> path

After you install the OpenShift CLI, it is available using the oc command:

  1. C:\> oc <command>
Installing the OpenShift CLI on macOS

You can install the OpenShift CLI (oc) binary on macOS by using the following procedure.

Procedure

  1. Navigate to https://mirror.openshift.com/pub/openshift-v4/clients/oc/latest/ and choose the folder for your operating system and architecture.

  2. Download oc.tar.gz.

  3. Unpack and unzip the archive.

  4. Move the oc binary to a directory on your PATH.

    To check your PATH, open a terminal and execute the following command:

    1. $ echo $PATH

After you install the OpenShift CLI, it is available using the oc command:

  1. $ oc <command>

Configuring credentials that allow images to be mirrored

Create a container image registry credentials file that allows mirroring images from Red Hat to your mirror.

Prerequisites

  • You configured a mirror registry to use in your restricted network.

Procedure

Complete the following steps on the installation host:

  1. Generate the base64-encoded user name and password or token for your mirror registry:

    1. $ echo -n '<user_name>:<password>' | base64 -w0 (1)
    2. BGVtbYk3ZHAtqXs=
    1For <user_name> and <password>, specify the user name and password that you configured for your registry.
  2. Create a .json file and add a section that describes your registry to it:

    1. {
    2. "auths": {
    3. "<mirror_registry>": { (1)
    4. "auth": "<credentials>", (2)
    5. "email": "you@example.com"
    6. }
    7. }
    8. }
    1For <mirror_registry>, specify the registry domain name, and optionally the port, that your mirror registry uses to serve content. For example, registry.example.com or registry.example.com:5000
    2For <credentials>, specify the base64-encoded user name and password for the mirror registry.

Mirroring the OKD image repository

Mirror the OKD image repository to your registry to use during cluster installation or upgrade.

Prerequisites

  • Your mirror host has access to the internet.

  • You configured a mirror registry to use in your restricted network and can access the certificate and credentials that you configured.

  • You have created a pull secret for your mirror repository.

  • If you use self-signed certificates that do not set a Subject Alternative Name, you must precede the oc commands in this procedure with GODEBUG=x509ignoreCN=0. If you do not set this variable, the oc commands will fail with the following error:

    1. x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

Procedure

Complete the following steps on the mirror host:

  1. Review the OKD downloads page to determine the version of OKD that you want to install and determine the corresponding tag on the Repository Tags page.

  2. Set the required environment variables:

    1. Export the release version:

      1. $ OCP_RELEASE=<release_version>

      For <release_version>, specify the tag that corresponds to the version of OKD to install, such as 4.5.4.

    2. Export the local registry name and host port:

      1. $ LOCAL_REGISTRY='<local_registry_host_name>:<local_registry_host_port>'

      For <local_registry_host_name>, specify the registry domain name for your mirror repository, and for <local_registry_host_port>, specify the port that it serves content on.

    3. Export the local repository name:

      1. $ LOCAL_REPOSITORY='<local_repository_name>'

      For <local_repository_name>, specify the name of the repository to create in your registry, such as ocp4/openshift4.

    4. Export the name of the repository to mirror:

      1. $ PRODUCT_REPO='openshift'
    5. Export the path to your registry pull secret:

      1. $ LOCAL_SECRET_JSON='<path_to_pull_secret>'

      For <path_to_pull_secret>, specify the absolute path to and file name of the pull secret for your mirror registry that you created.

    6. Export the release mirror:

      1. $ RELEASE_NAME="okd"
    7. Export the path to the directory to host the mirrored images:

      1. $ REMOVABLE_MEDIA_PATH=<path> (1)
      1Specify the full path, including the initial forward slash (/) character.
  3. Mirror the version images to the internal container registry:

    • If your mirror host does not have internet access, take the following actions:

      1. Connect the removable media to a system that is connected to the internet.

      2. Review the images and configuration manifests to mirror:

        1. $ oc adm release mirror -a ${LOCAL_SECRET_JSON} \
        2. --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE} \
        3. --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} \
        4. --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE} --dry-run
      3. Record the entire imageContentSources section from the output of the previous command. The information about your mirrors is unique to your mirrored repository, and you must add the imageContentSources section to the install-config.yaml file during installation.

      4. Mirror the images to a directory on the removable media:

        1. $ oc adm release mirror -a ${LOCAL_SECRET_JSON} --to-dir=${REMOVABLE_MEDIA_PATH}/mirror quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE}
      5. Take the media to the restricted network environment and upload the images to the local container registry.

        1. $ oc image mirror -a ${LOCAL_SECRET_JSON} --from-dir=${REMOVABLE_MEDIA_PATH}/mirror "file://openshift/release:${OCP_RELEASE}*" ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} (1)
        1For REMOVABLE_MEDIA_PATH, you must use the same path that you specified when you mirrored the images.
    • If the local container registry is connected to the mirror host, take the following actions:

      1. Directly push the release images to the local registry by using following command:

        1. $ oc adm release mirror -a ${LOCAL_SECRET_JSON} \
        2. --from=quay.io/${PRODUCT_REPO}/${RELEASE_NAME}:${OCP_RELEASE} \
        3. --to=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY} \
        4. --to-release-image=${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}

        This command pulls the release information as a digest, and its output includes the imageContentSources data that you require when you install your cluster.

      2. Record the entire imageContentSources section from the output of the previous command. The information about your mirrors is unique to your mirrored repository, and you must add the imageContentSources section to the install-config.yaml file during installation.

        The image name gets patched to Quay.io during the mirroring process, and the podman images will show Quay.io in the registry on the bootstrap virtual machine.

  1. To create the installation program that is based on the content that you mirrored, extract it and pin it to the release:

    • If your mirror host does not have internet access, run the following command:

      1. $ oc adm release extract -a ${LOCAL_SECRET_JSON} --command=openshift-install "${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}"
    • If the local container registry is connected to the mirror host, run the following command:

      1. $ oc adm release extract -a ${LOCAL_SECRET_JSON} --command=openshift-install "${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:${OCP_RELEASE}"

      To ensure that you use the correct images for the version of OKD that you selected, you must extract the installation program from the mirrored content.

      You must perform this step on a machine with an active internet connection.

      If you are in a disconnected environment, use the —image flag as part of must-gather and point to the payload image.

  2. For clusters using installer-provisioned infrastructure, run the following command:

    1. $ openshift-install

Adding FCOS ISO and RootFS images to a disconnected mirror host

Before you install a cluster on infrastructure that you provision, you must create Fedora CoreOS (FCOS) machines for it to use. Use a disconnected mirror to host the FCOS images you require to provision your distributed unit (DU) bare-metal hosts.

Prerequisites

  • Deploy and configure an HTTP server to host the FCOS image resources on the network. You must be able to access the HTTP server from your computer, and from the machines that you create.

The FCOS images might not change with every release of OKD. You must download images with the highest version that is less than or equal to the OKD version that you install. Use the image versions that match your OKD version if they are available. You require ISO and RootFS images to install FCOS on the DU hosts. FCOS qcow2 images are not supported for this installation type.

Procedure

  1. Log in to the mirror host.

  2. Obtain the FCOS ISO and RootFS images from mirror.openshift.com, for example:

    1. Export the required image names and OKD version as environment variables:

      1. $ export ISO_IMAGE_NAME=<iso_image_name> (1)
      1. $ export ROOTFS_IMAGE_NAME=<rootfs_image_name> (2)
      1. $ export OCP_VERSION=<ocp_version> (3)
      1ISO image name, for example, rhcos-4.9.0-fc.1-x86_64-live.x86_64.iso
      2RootFS image name, for example, rhcos-4.9.0-fc.1-x86_64-live-rootfs.x86_64.img
      3OKD version, for example, latest-4.9
    2. Download the required images:

      1. $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ISO_IMAGE_NAME} -O /var/www/html/${ISO_IMAGE_NAME}
      1. $ sudo wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/pre-release/${OCP_VERSION}/${ROOTFS_IMAGE_NAME} -O /var/www/html/${ROOTFS_IMAGE_NAME}

Verification steps

  • Verify that the images downloaded successfully and are being served on the disconnected mirror host, for example:

    1. $ wget http://$(hostname)/${ISO_IMAGE_NAME}

    Expected output

    1. ...
    2. Saving to: rhcos-4.9.0-fc.1-x86_64-live.x86_64.iso
    3. rhcos-4.9.0-fc.1-x86_64- 11%[====> ] 10.01M 4.71MB/s
    4. ...

Installing Red Hat Advanced Cluster Management in a disconnected environment

You use Red Hat Advanced Cluster Management (RHACM) on a hub cluster in the disconnected environment to manage the deployment of distributed unit (DU) profiles on multiple managed spoke clusters.

Prerequisites

  • Install the OKD CLI (oc).

  • Log in as a user with cluster-admin privileges.

  • Configure a disconnected mirror registry for use in the cluster.

    If you want to deploy Operators to the spoke clusters, you must also add them to this registry. See Mirroring an Operator catalog for more information.

Procedure

Enabling assisted installer service on bare metal

The Assisted Installer Service (AIS) deploys OKD clusters. Red Hat Advanced Cluster Management (RHACM) ships with AIS. AIS is deployed when you enable the MultiClusterHub Operator on the RHACM hub cluster.

For distributed units (DUs), RHACM supports OKD deployments that run on a single bare-metal host. The single node cluster acts as both a control plane and a worker node.

Prerequisites

  • Install OKD 4.9 on a hub cluster.

  • Install RHACM and create the MultiClusterHub resource.

  • Create persistent volume custom resources (CR) for database and file system storage.

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Modify the HiveConfig resource to enable the feature gate for Assisted Installer:

    1. $ oc patch hiveconfig hive --type merge -p '{"spec":{"targetNamespace":"hive","logLevel":"debug","featureGates":{"custom":{"enabled":["AlphaAgentInstallStrategy"]},"featureSet":"Custom"}}}'
  2. Modify the Provisioning resource to allow the Bare Metal Operator to watch all namespaces:

    1. $ oc patch provisioning provisioning-configuration --type merge -p '{"spec":{"watchAllNamespaces": true }}'
  3. Create the AgentServiceConfig CR.

    1. Save the following YAML in the agent_service_config.yaml file:

      1. apiVersion: agent-install.openshift.io/v1beta1
      2. kind: AgentServiceConfig
      3. metadata:
      4. name: agent
      5. spec:
      6. databaseStorage:
      7. accessModes:
      8. - ReadWriteOnce
      9. resources:
      10. requests:
      11. storage: <db_volume_size> (1)
      12. filesystemStorage:
      13. accessModes:
      14. - ReadWriteOnce
      15. resources:
      16. requests:
      17. storage: <fs_volume_size> (2)
      18. osImages: (3)
      19. - openshiftVersion: "<ocp_version>" (4)
      20. version: "<ocp_release_version>" (5)
      21. url: "<iso_url>" (6)
      22. rootFSUrl: "<root_fs_url>" (7)
      23. cpuArchitecture: "x86_64"
      1Volume size for the databaseStorage field, for example 10Gi.
      2Volume size for the filesystemStorage field, for example 20Gi.
      3List of OS image details. Example describes a single OKD OS version.
      4OKD version to install, for example, 4.8.
      5Specific install version, for example, 47.83.202103251640-0.
      6ISO url, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-4.7.7-x86_64-live.x86_64.iso.
      7Root FS image URL, for example, https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/4.7.7/rhcos-live-rootfs.x86_64.img
    2. Create the AgentServiceConfig CR by running the following command:

      1. $ oc create -f agent_service_config.yaml

      Example output

      1. agentserviceconfig.agent-install.openshift.io/agent created

ZTP custom resources

Zero touch provisioning (ZTP) uses custom resource (CR) objects to extend the Kubernetes API or introduce your own API into a project or a cluster. These CRs contain the site-specific data required to install and configure a cluster for RAN applications.

A custom resource definition (CRD) file defines your own object kinds. Deploying a CRD into the managed cluster causes the Kubernetes API server to begin serving the specified CR for the entire lifecycle.

For each CR in the <site>.yaml file on the managed cluster, ZTP uses the data to create installation CRs in a directory named for the cluster.

ZTP provides two ways for defining and installing CRs on managed clusters: a manual approach when you are provisioning a single cluster and an automated approach when provisioning multiple clusters.

Manual CR creation for single clusters

Use this method when you are creating CRs for a single cluster. This is a good way to test your CRs before deploying on a larger scale.

Automated CR creation for multiple managed clusters

Use the automated SiteConfig method when you are installing multiple managed clusters, for example, in batches of up to 100 clusters. SiteConfig uses ArgoCD as the engine for the GitOps method of site deployment. After completing a site plan that contains all of the required parameters for deployment, a policy generator creates the manifests and applies them to the hub cluster.

Both methods create the CRs shown in the following table. On the cluster site, an automated Discovery image ISO file creates a directory with the site name and a file with the cluster name. Every cluster has its own namespace, and all of the CRs are under that namespace. The namespace and the CR names match the cluster name.

ResourceDescriptionUsage

BareMetalHost

Contains the connection information for the Baseboard Management Controller (BMC) of the target bare metal machine.

Provides access to the BMC in order to load and boot the Discovery image ISO on the target machine by using the Redfish protocol.

InfraEnv

Contains information for pulling OKD onto the target bare metal machine.

Used with ClusterDeployment to generate the Discovery ISO for the managed cluster.

AgentClusterInstall

Specifies the managed cluster’s configuration such as networking and the number of supervisor (control plane) nodes. Shows the kubeconfig and credentials when the installation is complete.

Specifies the managed cluster configuration information and provides status during the installation of the cluster.

ClusterDeployment

References the AgentClusterInstall to use.

Used with InfraEnv to generate the Discovery ISO for the managed cluster.

NMStateConfig

Provides network configuration information such as MAC to IP mapping, DNS server, default route, and other network settings. This is not needed if DHCP is used.

Sets up a static IP address for the managed cluster’s Kube API server.

Agent

Contains hardware information about the target bare metal machine.

Created automatically on the hub when the target machine’s Discovery image ISO boots.

ManagedCluster

When a cluster is managed by the hub, it must be imported and known. This Kubernetes object provides that interface.

The hub uses this resource to manage and show the status of managed clusters.

KlusterletAddonConfig

Contains the list of services provided by the hub to be deployed to a ManagedCluster.

Tells the hub which addon services to deploy to a ManagedCluster.

Namespace

Logical space for ManagedCluster resources existing on the hub. Unique per site.

Propagates resources to the ManagedCluster.

Secret

Two custom resources are created: BMC Secret and Image Pull Secret.

  • BMC Secret authenticates into the target bare metal machine using its username and password.

  • Image Pull Secret contains authentication information for the OKD image installed on the target bare metal machine.

ClusterImageSet

Contains OKD image information such as the repository and image name.

Passed into resources to provide OKD images.

Creating custom resources to install a single managed cluster

This procedure tells you how to manually create and deploy a single managed cluster. If you are creating multiple clusters, perhaps hundreds, use the SiteConfig method described in “Creating ZTP custom resources for multiple managed clusters”.

Prerequisites

  • Enable Assisted Installer Service.

  • Ensure network connectivity:

    • The container within the hub must be able to reach the Baseboard Management Controller (BMC) address of the target bare metal machine.

    • The managed cluster must be able to resolve and reach the hub’s API hostname and *.app hostname. Example of the hub’s API and *.app hostname:

      1. console-openshift-console.apps.hub-cluster.internal.domain.com
      2. api.hub-cluster.internal.domain.com
    • The hub must be able to resolve and reach the API and *.app hostname of the managed cluster. Here is an example of the managed cluster’s API and *.app hostname:

      1. console-openshift-console.apps.sno-managed-cluster-1.internal.domain.com
      2. api.sno-managed-cluster-1.internal.domain.com
    • A DNS Server that is IP reachable from the target bare metal machine.

  • A target bare metal machine for the managed cluster with the following hardware minimums:

    • 4 CPU or 8 vCPU

    • 32 GiB RAM

    • 120 GiB Disk for root filesystem

  • When working in a disconnected environment, the release image needs to be mirrored. Use this command to mirror the release image:

    1. oc adm release mirror -a <pull_secret.json>
    2. --from=quay.io/openshift-release-dev/ocp-release:{{ mirror_version_spoke_release }}
    3. --to={{ provisioner_cluster_registry }}/ocp4 --to-release-image={{
    4. provisioner_cluster_registry }}/ocp4:{{ mirror_version_spoke_release }}
  • You mirrored the ISO and rootfs used to generate the spoke cluster ISO to an HTTP server and configured the settings to pull images from there.

    The images must match the version of the ClusterImageSet. To deploy a 4.9.0 version, the rootfs and ISO need to be set at 4.9.0.

Procedure

  1. Create a ClusterImageSet for each specific cluster version that needs to be deployed. A ClusterImageSet has the following format:

    1. apiVersion: hive.openshift.io/v1
    2. kind: ClusterImageSet
    3. metadata:
    4. name: openshift-4.9.0-rc.0 (1)
    5. spec:
    6. releaseImage: quay.io/openshift-release-dev/ocp-release:4.9.0-x86_64 (2)
    1name is the descriptive version that you want to deploy.
    2releaseImage needs to point to the specific release image to deploy.
  2. Create the Namespace definition for the managed cluster:

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. name: <cluster-name> (1)
    5. labels:
    6. name: <cluster-name> (1)
    1cluster-name is the name of the managed cluster to provision.
  3. Create the BMC Secret custom resource:

    1. apiVersion: v1
    2. data:
    3. password: <bmc-password> (1)
    4. username: <bmc-username> (2)
    5. kind: Secret
    6. metadata:
    7. name: <cluster-name>-bmc-secret
    8. namespace: <cluster-name>
    9. type: Opaque
    1bmc-password is the password to the target bare metal machine. Must be base-64 encoded.
    2bmc-username is the username to the target bare metal machine. Must be base-64 encoded.
  4. Create the Image Pull Secret custom resource:

    1. apiVersion: v1
    2. data:
    3. .dockerconfigjson: <pull-secret> (1)
    4. kind: Secret
    5. metadata:
    6. name: assisted-deployment-pull-secret
    7. namespace: <cluster-name>
    8. type: kubernetes.io/dockerconfigjson
    1pull-secret is the OKD pull secret. Must be base-64 encoded.
  5. Create the AgentClusterInstall custom resource:

    1. apiVersion: extensions.hive.openshift.io/v1beta1
    2. kind: AgentClusterInstall
    3. metadata:
    4. # Only include the annotation if using OVN, otherwise omit the annotation
    5. annotations:
    6. agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
    7. name: <cluster-name>
    8. namespace: <cluster-name>
    9. spec:
    10. clusterDeploymentRef:
    11. name: <cluster-name>
    12. imageSetRef:
    13. name: <cluster-image-set> (1)
    14. networking:
    15. clusterNetwork:
    16. - cidr: 10.128.0.0/14
    17. hostPrefix: 23
    18. machineNetwork:
    19. - cidr: <machine-network-cidr> (2)
    20. serviceNetwork:
    21. - 172.30.0.0/16
    22. provisionRequirements:
    23. controlPlaneAgents: 1
    24. workerAgents: 0
    25. sshPublicKey: <public-key> (3)
    1cluster-image-set is the name of the ClusterImageSet custom resource.
    2machine-network-cidr is the target bare metal machine’s CIDR.
    3public-key entered as plain text can be used to SSH into the target bare metal machine after the host is installed.

    If you want to configure a static IP for the managed cluster at this point, see the procedure in this document for configuring static IP addresses for managed clusters.

  6. Create the ClusterDeployment custom resource:

    1. apiVersion: hive.openshift.io/v1
    2. kind: ClusterDeployment
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. spec:
    7. baseDomain: <base-domain> (1)
    8. clusterInstallRef:
    9. group: extensions.hive.openshift.io
    10. kind: AgentClusterInstall
    11. name: <cluster-name>
    12. version: v1beta1
    13. clusterName: <cluster-name>
    14. platform:
    15. agentBareMetal:
    16. agentSelector:
    17. matchLabels:
    18. cluster-name: <cluster-name>
    19. pullSecretRef:
    20. name: assisted-deployment-pull-secret
    1base-domain is the managed cluster’s base domain.
  7. Create the KlusterletAddonConfig custom resource:

    1. apiVersion: agent.open-cluster-management.io/v1
    2. kind: KlusterletAddonConfig
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. spec:
    7. clusterName: <cluster-name>
    8. clusterNamespace: <cluster-name>
    9. clusterLabels:
    10. cloud: auto-detect
    11. vendor: auto-detect
    12. applicationManager:
    13. enabled: true
    14. certPolicyController:
    15. enabled: false
    16. iamPolicyController:
    17. enabled: false
    18. policyController:
    19. enabled: true
    20. searchCollector:
    21. enabled: false (1)
    1enabled: is set to either true to enable KlusterletAddonConfig or false to disable the KlusterletAddonConfig. Keep searchCollector disabled.
  8. Create the ManagedCluster custom resource:

    1. apiVersion: cluster.open-cluster-management.io/v1
    2. kind: ManagedCluster
    3. metadata:
    4. name: <cluster-name>
    5. spec:
    6. hubAcceptsClient: true
  9. Create the InfraEnv custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: InfraEnv
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. spec:
    7. clusterRef:
    8. name: <cluster-name>
    9. namespace: <cluster-name>
    10. sshAuthorizedKey: <public-key> (1)
    11. agentLabelSelector:
    12. matchLabels:
    13. cluster-name: <cluster-name>
    14. pullSecretRef:
    15. name: assisted-deployment-pull-secret
    1Enter public-key as plain text and use it to SSH into the target bare metal machine when the host is booted from the ISO.
  10. Create the BareMetalHost custom resource:

    1. apiVersion: metal3.io/v1alpha1
    2. kind: BareMetalHost
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. annotations:
    7. inspect.metal3.io: disabled
    8. labels:
    9. infraenvs.agent-install.openshift.io: "<cluster-name>"
    10. spec:
    11. bootMode: "UEFI"
    12. bmc:
    13. address: <bmc-address> (1)
    14. disableCertificateVerification: true
    15. credentialsName: <cluster-name>-bmc-secret
    16. bootMACAddress: <mac-address> (2)
    17. automatedCleaningMode: disabled
    18. online: true
    1bmc-address is the baseboard address of the target bare metal machine.
    2mac-address is the target bare metal machine’s MAC address.

    Optionally, you can add bmac.agent-install.openshift.io/hostname: <host-name> as an annotation to set the managed cluster’s hostname, otherwise it will default to either a hostname from the DHCP server or local host.

  11. After you have created the custom resources, push the entire directory of generated custom resources to the Git repository you created for storing the custom resources.

Next step

To provision additional clusters, repeat this procedure for each cluster.

Configuring static IP addresses for managed clusters

Optionally, after creating the AgentClusterInstall custom resource, you can configure static IP addresses for the managed clusters.

You must create this custom resource before creating the ClusterDeployment custom resource.

Prerequisites

  • Deploy and configure the AgentClusterInstall custom resource.

Procedure

  1. Create a NMStateConfig custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: NMStateConfig
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. labels:
    7. sno-cluster-<cluster-name>: <cluster-name>
    8. spec:
    9. config:
    10. interfaces:
    11. - name: eth0
    12. type: ethernet
    13. state: up
    14. mac-address: <mac-address> (1)
    15. ipv4:
    16. enabled: true
    17. address:
    18. - ip: <ip-address> (2)
    19. prefix-length: <public-network-prefix> (3)
    20. dhcp: false
    21. dns-resolver:
    22. config:
    23. server:
    24. - <dns-resolver> (4)
    25. routes:
    26. config:
    27. - destination: 0.0.0.0/0
    28. next-hop-address: <gateway> (5)
    29. next-hop-interface: eth0
    30. table-id: 254
    31. interfaces:
    32. - name: "eth0" (6)
    33. macAddress: <mac-address> (7)
    1mac-address is the MAC address of the target bare metal machine, that is, the same MAC address used in the BareMetalHost resource.
    2ip-address is the static IP address of the target bare metal machine.
    3public-network-prefix is the static IP address’s subnet for the target bare metal machine.
    4dns-resolver is the DNS server for the target bare metal machine.
    5gateway is the gateway for the target bare metal machine.
    6name must match the name specified in the interfaces section.
    7mac-address must match the MAC address specified in the interfaces section.
  2. When creating the InfraEnv custom resource, reference the label from the NMStateConfig custom resource in the InfraEnv custom resource:

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: InfraEnv
    3. metadata:
    4. name: <cluster-name>
    5. namespace: <cluster-name>
    6. spec:
    7. clusterRef:
    8. name: <cluster-name>
    9. namespace: <cluster-name>
    10. sshAuthorizedKey: <public-key>
    11. agentLabelSelector:
    12. matchLabels:
    13. cluster-name: <cluster-name>
    14. pullSecretRef:
    15. name: assisted-deployment-pull-secret
    16. nmStateConfigLabelSelector:
    17. matchLabels:
    18. sno-cluster-<cluster-name>: <cluster-name> # Match this label

Automated Discovery image ISO process for provisioning clusters

After you create the custom resources, the following actions happen automatically:

  1. A Discovery image ISO file is generated and booted on the target machine.

  2. When the ISO file successfully boots on the target machine it reports the hardware information of the target machine.

  3. After all hosts are discovered, OKD is installed.

  4. When OKD finishes installing, the hub installs the klusterlet service on the target cluster.

  5. The requested add-on services are installed on the target cluster.

The Discovery image ISO process finishes when the Agent custom resource is created on the hub for the managed cluster.

Checking the managed cluster status

Ensure that cluster provisioning was successful by checking the cluster status.

Prerequisites

  • All of the custom resources have been configured and provisioned, and the Agent custom resource is created on the hub for the managed cluster.

Procedure

  1. Check the status of the managed cluster:

    1. $ oc get managedcluster

    True indicates the managed cluster is ready.

  2. Check the agent status:

    1. $ oc get agent -n <cluster-name>
  3. Use the describe command to provide an in-depth description of the agent’s condition. Statuses to be aware of include BackendError, InputError, ValidationsFailing, InstallationFailed, and AgentIsConnected. These statuses are relevant to the Agent and AgentClusterInstall custom resources.

    1. $ oc describe agent -n <cluster-name>
  4. Check the cluster provisioning status:

    1. $ oc get agentclusterinstall -n <cluster-name>
  5. Use the describe command to provide an in-depth description of the cluster provisioning status:

    1. $ oc describe agentclusterinstall -n <cluster-name>
  6. Check the status of the managed cluster’s add-on services:

    1. $ oc get managedclusteraddon -n <cluster-name>
  7. Retrieve the authentication information of the kubeconfig file for the managed cluster:

    1. $ oc get secret -n <cluster-name> <cluster-name>-admin-kubeconfig -o jsonpath={.data.kubeconfig} | base64 -d > <directory>/<cluster-name>-kubeconfig

Configuring a managed cluster for a disconnected environment

After you have completed the preceding procedure, follow these steps to configure the managed cluster for a disconnected environment.

Prerequisites

  • A disconnected installation of Red Hat Advanced Cluster Management (RHACM) 2.3.

  • Host the rootfs and iso images on an HTTPD server.

Procedure

  1. Create a ConfigMap containing the mirror registry config:

    1. apiVersion: v1
    2. kind: ConfigMap
    3. metadata:
    4. name: assisted-installer-mirror-config
    5. namespace: assisted-installer
    6. labels:
    7. app: assisted-service
    8. data:
    9. ca-bundle.crt: <certificate> (1)
    10. registries.conf: | (2)
    11. unqualified-search-registries = ["registry.access.redhat.com", "docker.io"]
    12. [[registry]]
    13. location = <mirror-registry-url> (3)
    14. insecure = false
    15. mirror-by-digest-only = true
    1certificate is the mirror registry’s certificate used when creating the mirror registry.
    2registry-config is the configuration for the mirror registry.
    3mirror-registry-url is the URL of the mirror registry.

    This updates mirrorRegistryRef in the AgentServiceConfig custom resource, as shown below:

    Example output

    1. apiVersion: agent-install.openshift.io/v1beta1
    2. kind: AgentServiceConfig
    3. metadata:
    4. name: agent
    5. namespace: assisted-installer
    6. spec:
    7. databaseStorage:
    8. volumeName: <db-pv-name>
    9. accessModes:
    10. - ReadWriteOnce
    11. resources:
    12. requests:
    13. storage: <db-storage-size>
    14. filesystemStorage:
    15. volumeName: <fs-pv-name>
    16. accessModes:
    17. - ReadWriteOnce
    18. resources:
    19. requests:
    20. storage: <fs-storage-size>
    21. mirrorRegistryRef:
    22. name: 'assisted-installer-mirror-config'
    23. osImages:
    24. - openshiftVersion: <ocp-version>
    25. rootfs: <rootfs-url> (1)
    26. url: <iso-url> (1)
    1rootfs-url and the iso-url must match the URLs of the HTTPD server.
  2. For disconnected installations, you must deploy an NTP clock that is reachable through the disconnected network. You can do this by configuring chrony to act as server, editing the /etc/chrony.conf file, and adding the following allowed IPv6 range:

    1. # Allow NTP client access from local network.
    2. #allow 192.168.0.0/16
    3. local stratum 10
    4. bindcmdaddress ::
    5. allow 2620:52:0:1310::/64

Configuring IPv6 addresses for a disconnected environment

Optionally, when you are creating the AgentClusterInstall custom resource, you can configure IPV6 addresses for the managed clusters.

Procedure

  1. In the AgentClusterInstall custom resource, modify the IP addresses in clusterNetwork and serviceNetwork for IPv6 addresses:

    1. apiVersion: extensions.hive.openshift.io/v1beta1
    2. kind: AgentClusterInstall
    3. metadata:
    4. # Only include the annotation if using OVN, otherwise omit the annotation
    5. annotations:
    6. agent-install.openshift.io/install-config-overrides: '{"networking":{"networkType":"OVNKubernetes"}}'
    7. name: <cluster-name>
    8. namespace: <cluster-name>
    9. spec:
    10. clusterDeploymentRef:
    11. name: <cluster-name>
    12. imageSetRef:
    13. name: <cluster-image-set>
    14. networking:
    15. clusterNetwork:
    16. - cidr: "fd01::/48"
    17. hostPrefix: 64
    18. machineNetwork:
    19. - cidr: <machine-network-cidr>
    20. serviceNetwork:
    21. - "fd02::/112"
    22. provisionRequirements:
    23. controlPlaneAgents: 1
    24. workerAgents: 0
    25. sshPublicKey: <public-key>
  2. Update the NMStateConfig custom resource with the IPv6 addresses you defined.

Troubleshooting the managed cluster

Use this procedure to diagnose any installation issues that might occur with the managed clusters.

Procedure

  1. Check the status of the managed cluster:

    1. $ oc get managedcluster

    Example output

    1. NAME HUB ACCEPTED MANAGED CLUSTER URLS JOINED AVAILABLE AGE
    2. SNO-cluster true True True 2d19h

    If the status in the AVAILABLE column is True, the managed cluster is being managed by the hub.

    If the status in the AVAILABLE column is Unknown, the managed cluster is not being managed by the hub. Use the following steps to continue checking to get more information.

  2. Check the AgentClusterInstall install status:

    1. $ oc get clusterdeployment -n <cluster-name>

    Example output

    1. NAME PLATFORM REGION CLUSTERTYPE INSTALLED INFRAID VERSION POWERSTATE AGE
    2. Sno0026 agent-baremetal false Initialized
    3. 2d14h

    If the status in the INSTALLED column is false, the installation was unsuccessful.

  3. If the installation failed, enter the following command to review the status of the AgentClusterInstall resource:

    1. $ oc describe agentclusterinstall -n <cluster-name> <cluster-name>
  4. Resolve the errors and reset the cluster:

    1. Remove the cluster’s namespace:

      1. $ oc delete namespace <cluster-name>

      This deletes all of the namespace-scoped custom resources created for this cluster.

    2. Remove the cluster’s managed cluster resource:

      1. $ oc delete managedcluster <cluster-name>
    3. Recreate the custom resources for the managed cluster.

Applying the RAN policies for monitoring cluster activity

Zero touch provisioning (ZTP) uses Red Hat Advanced Cluster Management (RHACM) to apply the radio access network (RAN) policies using a policy-based governance approach to automatically monitor cluster activity.

The policy generator (PolicyGen) is a Kustomize plugin that facilitates creating ACM policies from predefined custom resources. There are three main items: Policy Categorization, Source CR policy, and PolicyGenTemplate. PolicyGen relies on these to generate the policies and their placement bindings and rules.

The following diagram shows how the RAN policy generator interacts with GitOps and ACM.

RAN policy generator

RAN policies are categorized into three main groups:

Common

A policy that exists in the Common category is applied to all clusters to be represented by the site plan.

Groups

A policy that exists in the Groups category is applied to a group of clusters. Every group of clusters could have their own policies that exist under the Groups category. For example, Groups/group1 could have its own policies that are applied to the clusters belonging to group1.

Sites

A policy that exists in the Sites category is applied to a specific cluster. Any cluster could have its own policies that exist in the Sites category. For example, Sites/cluster1 will have its own policies applied to cluster1.

The following diagram shows how policies are generated.

Generating policies

Applying source custom resource policies

Source custom resource policies include the following:

  • SR-IOV policies

  • PTP policies

  • Performance Add-on Operator policies

  • MachineConfigPool policies

  • SCTP policies

You need to define the source custom resource that generates the ACM policy with consideration of possible overlay to its metadata or spec/data. For example, a common-namespace-policy contains a Namespace definition that exists in all managed clusters. This namespace is placed under the Common category and there are no changes for its spec or data across all clusters.

Namespace policy example

The following example shows the source custom resource for this namespace:

  1. apiVersion: v1
  2. kind: Namespace
  3. metadata:
  4. name: openshift-sriov-network-operator
  5. labels:
  6. openshift.io/run-level: "1"

Example output

The generated policy that applies this namespace includes the namespace as it is defined above without any change, as shown in this example:

  1. apiVersion: policy.open-cluster-management.io/v1
  2. kind: Policy
  3. metadata:
  4. name: common-sriov-sub-ns-policy
  5. namespace: common-sub
  6. annotations:
  7. policy.open-cluster-management.io/categories: CM Configuration Management
  8. policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
  9. policy.open-cluster-management.io/standards: NIST SP 800-53
  10. spec:
  11. remediationAction: enforce
  12. disabled: false
  13. policy-templates:
  14. - objectDefinition:
  15. apiVersion: policy.open-cluster-management.io/v1
  16. kind: ConfigurationPolicy
  17. metadata:
  18. name: common-sriov-sub-ns-policy-config
  19. spec:
  20. remediationAction: enforce
  21. severity: low
  22. namespaceselector:
  23. exclude:
  24. - kube-*
  25. include:
  26. - '*'
  27. object-templates:
  28. - complianceType: musthave
  29. objectDefinition:
  30. apiVersion: v1
  31. kind: Namespace
  32. metadata:
  33. labels:
  34. openshift.io/run-level: "1"
  35. name: openshift-sriov-network-operator

SRIOV policy example

The following example shows a SriovNetworkNodePolicy definition that exists in different clusters with a different specification for each cluster. The example also shows the source custom resource for the SriovNetworkNodePolicy:

  1. apiVersion: sriovnetwork.openshift.io/v1
  2. kind: SriovNetworkNodePolicy
  3. metadata:
  4. name: sriov-nnp
  5. namespace: openshift-sriov-network-operator
  6. spec:
  7. # The $ tells the policy generator to overlay/remove the spec.item in the generated policy.
  8. deviceType: $deviceType
  9. isRdma: false
  10. nicSelector:
  11. pfNames: [$pfNames]
  12. nodeSelector:
  13. node-role.kubernetes.io/worker: ""
  14. numVfs: $numVfs
  15. priority: $priority
  16. resourceName: $resourceName

Example output

The SriovNetworkNodePolicy name and namespace are the same for all clusters, so both are defined in the source SriovNetworkNodePolicy. However, the generated policy requires the $deviceType, $numVfs, as input parameters in order to adjust the policy for each cluster. The generated policy is shown in this example:

  1. apiVersion: policy.open-cluster-management.io/v1
  2. kind: Policy
  3. metadata:
  4. name: site-du-sno-1-sriov-nnp-mh-policy
  5. namespace: sites-sub
  6. annotations:
  7. policy.open-cluster-management.io/categories: CM Configuration Management
  8. policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
  9. policy.open-cluster-management.io/standards: NIST SP 800-53
  10. spec:
  11. remediationAction: enforce
  12. disabled: false
  13. policy-templates:
  14. - objectDefinition:
  15. apiVersion: policy.open-cluster-management.io/v1
  16. kind: ConfigurationPolicy
  17. metadata:
  18. name: site-du-sno-1-sriov-nnp-mh-policy-config
  19. spec:
  20. remediationAction: enforce
  21. severity: low
  22. namespaceselector:
  23. exclude:
  24. - kube-*
  25. include:
  26. - '*'
  27. object-templates:
  28. - complianceType: musthave
  29. objectDefinition:
  30. apiVersion: sriovnetwork.openshift.io/v1
  31. kind: SriovNetworkNodePolicy
  32. metadata:
  33. name: sriov-nnp-du-mh
  34. namespace: openshift-sriov-network-operator
  35. spec:
  36. deviceType: vfio-pci
  37. isRdma: false
  38. nicSelector:
  39. pfNames:
  40. - ens7f0
  41. nodeSelector:
  42. node-role.kubernetes.io/worker: ""
  43. numVfs: 8
  44. resourceName: du_mh

Defining the required input parameters as $value, for example $deviceType, is not mandatory. The $ tells the policy generator to overlay or remove the item from the generated policy. Otherwise, the value does not change.

The PolicyGenTemplate

The PolicyGenTemplate.yaml file is a Custom Resource Definition (CRD) that tells PolicyGen where to categorize the generated policies and which items need to be overlaid.

The following example shows the PolicyGenTemplate.yaml file:

  1. apiVersion: ran.openshift.io/v1
  2. kind: PolicyGenTemplate
  3. metadata:
  4. name: "group-du-sno"
  5. namespace: "group-du-sno"
  6. spec:
  7. bindingRules:
  8. group-du-sno: ""
  9. mcp: "master"
  10. sourceFiles:
  11. - fileName: ConsoleOperatorDisable.yaml
  12. policyName: "console-policy"
  13. - fileName: ClusterLogging.yaml
  14. policyName: "cluster-log-policy"
  15. spec:
  16. curation:
  17. curator:
  18. schedule: "30 3 * * *"
  19. collection:
  20. logs:
  21. type: "fluentd"
  22. fluentd: {}

The group-du-ranGen.yaml file defines a group of policies under a group named group-du. This file defines a MachineConfigPool worker-du that is used as the node selector for any other policy defined in sourceFiles. An ACM policy is generated for every source file that exists in sourceFiles. And, a single placement binding and placement rule is generated to apply the cluster selection rule for group-du policies.

Using the source file PtpConfigSlave.yaml as an example, the PtpConfigSlave has a definition of a PtpConfig custom resource (CR). The generated policy for the PtpConfigSlave example is named group-du-ptp-config-policy. The PtpConfig CR defined in the generated group-du-ptp-config-policy is named du-ptp-slave. The spec defined in PtpConfigSlave.yaml is placed under du-ptp-slave along with the other spec items defined under the source file.

The following example shows the group-du-ptp-config-policy:

  1. apiVersion: policy.open-cluster-management.io/v1
  2. kind: Policy
  3. metadata:
  4. name: group-du-ptp-config-policy
  5. namespace: groups-sub
  6. annotations:
  7. policy.open-cluster-management.io/categories: CM Configuration Management
  8. policy.open-cluster-management.io/controls: CM-2 Baseline Configuration
  9. policy.open-cluster-management.io/standards: NIST SP 800-53
  10. spec:
  11. remediationAction: enforce
  12. disabled: false
  13. policy-templates:
  14. - objectDefinition:
  15. apiVersion: policy.open-cluster-management.io/v1
  16. kind: ConfigurationPolicy
  17. metadata:
  18. name: group-du-ptp-config-policy-config
  19. spec:
  20. remediationAction: enforce
  21. severity: low
  22. namespaceselector:
  23. exclude:
  24. - kube-*
  25. include:
  26. - '*'
  27. object-templates:
  28. - complianceType: musthave
  29. objectDefinition:
  30. apiVersion: ptp.openshift.io/v1
  31. kind: PtpConfig
  32. metadata:
  33. name: slave
  34. namespace: openshift-ptp
  35. spec:
  36. recommend:
  37. - match:
  38. - nodeLabel: node-role.kubernetes.io/worker-du
  39. priority: 4
  40. profile: slave
  41. profile:
  42. - interface: ens5f0
  43. name: slave
  44. phc2sysOpts: -a -r -n 24
  45. ptp4lConf: |
  46. [global]
  47. #
  48. # Default Data Set
  49. #
  50. twoStepFlag 1
  51. slaveOnly 0
  52. priority1 128
  53. priority2 128
  54. domainNumber 24
  55. .....

Considerations when creating custom resource policies

  • The custom resources used to create the ACM policies should be defined with consideration of possible overlay to its metadata and spec/data. For example, if the custom resource metadata.name does not change between clusters then you should set the metadata.name value in the custom resource file. If the custom resource will have multiple instances in the same cluster, then the custom resource metadata.name must be defined in the policy template file.

  • In order to apply the node selector for a specific machine config pool, you have to set the node selector value as $mcp in order to let the policy generator overlay the $mcp value with the defined mcp in the policy template.

  • Subscription source files do not change.

Generating RAN policies

Prerequisites

  • Install Kustomize

  • Install golang

Procedure

  1. Build the plug-in using the following commands:

    1. $ cd ztp/ztp-policy-generator/kustomize/plugin/policyGenerator/v1/policygenerator/
    1. $ go build -o PolicyGenerator

    The kustomization.yaml file has a reference to the policyGenerator.yaml file. The following example shows the PolicyGenerator definition:

    1. apiVersion: policyGenerator/v1
    2. kind: PolicyGenerator
    3. metadata:
    4. name: acm-policy
    5. namespace: acm-policy-generator
    6. # The arguments should be given and defined as below with same order --policyGenTempPath= --sourcePath= --outPath= --stdout --customResources
    7. argsOneLiner: ./ranPolicyGenTempExamples ./sourcePolicies ./out true false

    Where:

    • policyGenTempPath is the path to the policyGenTemp files.

    • sourcePath: is the path to the source policies.

    • outPath: is the path to save the generated ACM policies.

    • stdout: If true, prints the generated policies to the console.

    • customResources: If true generates the CRs from the sourcePolicies files without ACM policies.

  2. Test PolicyGen by running the following commands:

    1. $ cd cnf-features-deploy/ztp/ztp-policy-generator/
    1. $ XDG_CONFIG_HOME=./ kustomize build --enable-alpha-plugins

    An out directory is created with the expected policies, as shown in this example:

    1. out
    2. ├── common
    3. ├── common-log-sub-ns-policy.yaml
    4. ├── common-log-sub-oper-policy.yaml
    5. ├── common-log-sub-policy.yaml
    6. ├── common-pao-sub-catalog-policy.yaml
    7. ├── common-pao-sub-ns-policy.yaml
    8. ├── common-pao-sub-oper-policy.yaml
    9. ├── common-pao-sub-policy.yaml
    10. ├── common-policies-placementbinding.yaml
    11. ├── common-policies-placementrule.yaml
    12. ├── common-ptp-sub-ns-policy.yaml
    13. ├── common-ptp-sub-oper-policy.yaml
    14. ├── common-ptp-sub-policy.yaml
    15. ├── common-sriov-sub-ns-policy.yaml
    16. ├── common-sriov-sub-oper-policy.yaml
    17. └── common-sriov-sub-policy.yaml
    18. ├── groups
    19. ├── group-du
    20. ├── group-du-mc-chronyd-policy.yaml
    21. ├── group-du-mc-mount-ns-policy.yaml
    22. ├── group-du-mcp-du-policy.yaml
    23. ├── group-du-mc-sctp-policy.yaml
    24. ├── group-du-policies-placementbinding.yaml
    25. ├── group-du-policies-placementrule.yaml
    26. ├── group-du-ptp-config-policy.yaml
    27. └── group-du-sriov-operconfig-policy.yaml
    28. └── group-sno-du
    29. ├── group-du-sno-policies-placementbinding.yaml
    30. ├── group-du-sno-policies-placementrule.yaml
    31. ├── group-sno-du-console-policy.yaml
    32. ├── group-sno-du-log-forwarder-policy.yaml
    33. └── group-sno-du-log-policy.yaml
    34. └── sites
    35. └── site-du-sno-1
    36. ├── site-du-sno-1-policies-placementbinding.yaml
    37. ├── site-du-sno-1-policies-placementrule.yaml
    38. ├── site-du-sno-1-sriov-nn-fh-policy.yaml
    39. ├── site-du-sno-1-sriov-nnp-mh-policy.yaml
    40. ├── site-du-sno-1-sriov-nw-fh-policy.yaml
    41. ├── site-du-sno-1-sriov-nw-mh-policy.yaml
    42. └── site-du-sno-1-.yaml

    The common policies are flat because they will be applied to all clusters. However, the groups and sites have subdirectories for each group and site as they will be applied to different clusters.

Cluster provisioning

Zero touch provisioning (ZTP) provisions clusters using a layered approach. The base components consist of Fedora CoreOS (FCOS), the basic operating system for the cluster, and OKD. After these components are installed, the worker node can join the existing cluster. When the node has joined the existing cluster, the 5G RAN profile Operators are applied.

The following diagram illustrates this architecture.

Cluster provisioning

The following RAN Operators are deployed on every cluster:

  • Machine Config

  • Precision Time Protocol (PTP)

  • Performance Addon Operator

  • SR-IOV

  • Local Storage Operator

  • Logging Operator

Machine Config Operator

The Machine Config Operator enables system definitions and low-level system settings such as workload partitioning, NTP, and SCTP. This Operator is installed with OKD.

A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance addons that encompass kernel args, kube config, huge pages allocation, and deployment of the realtime kernel (rt-kernel). The performance addons controller monitors changes in the MCP and updates the performance profile status accordingly.

Performance Addon Operator

The Performance Addon Operator provides the ability to enable advanced node performance tunings on a set of nodes.

OKD provides a Performance Addon Operator to implement automatic tuning to achieve low latency performance for OKD applications. The cluster administrator uses this performance profile configuration that makes it easier to make these changes in a more reliable way.

The administrator can specify updating the kernel to rt-kernel, reserving CPUs for management workloads, and using CPUs for running the workloads.

SR-IOV Operator

The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.

The SR-IOV Operator allows network interfaces to be virtual and shared at a device level with networking functions running within the cluster.

The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource named default in the openshift-sriov-network-operator namespace. The default custom resource contains the SR-IOV Network Operator configuration for your cluster.

Precision Time Protocol Operator

The Precision Time Protocol (PTP) Operator is a protocol used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy. PTP support is divided between the kernel and user space.

The clocks synchronized by PTP are organized in a master-worker hierarchy. The workers are synchronized to their masters, which may be workers to their own masters. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. When a clock has only one port, it can be master or worker, such a clock is called an ordinary clock (OC). A clock with multiple ports can be master on one port and worker on another, such a clock is called a boundary clock (BC). The top-level master is called the grandmaster clock, which can be synchronized by using a Global Positioning System (GPS) time source. By using a GPS-based time source, disparate networks can be synchronized with a high-degree of accuracy.

Creating ZTP custom resources for multiple managed clusters

If you are installing multiple managed clusters, zero touch provisioning (ZTP) uses ArgoCD and SiteConfig to manage the processes that create the custom resources (CR) and generate and apply the policies for multiple clusters, in batches of no more than 100, using the GitOps approach.

Installing and deploying the clusters is a two stage process, as shown here:

GitOps approach for Installing and deploying the clusters

Prerequisites for deploying the ZTP pipeline

  • Openshift cluster version 4.8 or higher and Red Hat GitOps Operator is installed.

  • Red Hat Advanced Cluster Management (RHACM) version 2.3 or above is installed.

  • For disconnected environments, make sure your source data Git repository and ztp-site-generator container image are accessible from the hub cluster.

  • If you want additional custom content, such as extra install manifests or custom resources (CR) for policies, add them to the /usr/src/hook/ztp/source-crs/extra-manifest/ directory. Similarly, you can add additional configuration CRs, as referenced from a PolicyGenTemplate, to the /usr/src/hook/ztp/source-crs/ directory.

    • Create a Containerfile that adds your additional manifests to the Red Hat provided image, for example:

      1. FROM <registry fqdn>/ztp-site-generator:latest (1)
      2. COPY myInstallManifest.yaml /usr/src/hook/ztp/source-crs/extra-manifest/
      3. COPY mySourceCR.yaml /usr/src/hook/ztp/source-crs/
      1<registry fqdn> must point to a registry containing the ztp-site-generator container image provided by Red Hat.
    • Build a new container image that includes these additional files:

      1. $> podman build Containerfile.example

Installing the GitOps ZTP pipeline

The procedures in this section tell you how to complete the following tasks:

  • Prepare the Git repository you need to host site configuration data.

  • Configure the hub cluster for generating the required installation and policy custom resources (CR).

  • Deploy the managed clusters using zero touch provisioning (ZTP).

Preparing the ZTP Git repository

Create a Git repository for hosting site configuration data. The zero touch provisioning (ZTP) pipeline requires read access to this repository.

Procedure

  1. Create a directory structure with separate paths for the SiteConfig and PolicyGenTemplate custom resources (CR).

  2. Add pre-sync.yaml and post-sync.yaml from resource-hook-example/<policygentemplates>/ to the path for the PolicyGenTemplate CRs.

  3. Add pre-sync.yaml and post-sync.yaml from resource-hook-example/<siteconfig>/ to the path for the SiteConfig CRs.

    If your hub cluster operates in a disconnected environment, you must update the image for all four pre and post sync hook CRs.

  4. Apply the policygentemplates.ran.openshift.io and siteconfigs.ran.openshift.io CR definitions.

Preparing the hub cluster for ZTP

You can configure your hub cluster with a set of ArgoCD applications that generate the required installation and policy custom resources (CR) for each site based on a zero touch provisioning (ZTP) GitOps flow.

Procedure

  1. Install the Red Hat OpenShift GitOps Operator on your hub cluster.

  2. Extract the administrator password for ArgoCD:

    1. $ oc get secret openshift-gitops-cluster -n openshift-gitops -o jsonpath='{.data.admin\.password}' | base64 -d
  3. Prepare the ArgoCD pipeline configuration:

    1. Clone the Git repository.

    2. Modify the source values of the two ArgoCD applications, deployment/clusters-app.yaml and deployment/policies-app.yaml with appropriate URL, targetRevision branch, and path values. The path values must match those used in your Git repository.

      Modify deployment/clusters-app.yaml:

      1. apiVersion: v1
      2. kind: Namespace
      3. metadata:
      4. name: clusters-sub
      5. ---
      6. apiVersion: argoproj.io/v1alpha1
      7. kind: Application
      8. metadata:
      9. name: clusters
      10. namespace: openshift-gitops
      11. spec:
      12. destination:
      13. server: https://kubernetes.default.svc
      14. namespace: clusters-sub
      15. project: default
      16. source:
      17. path: ztp/gitops-subscriptions/argocd/resource-hook-example/siteconfig (1)
      18. repoURL: https://github.com/openshift-kni/cnf-features-deploy (2)
      19. targetRevision: master (3)
      20. syncPolicy:
      21. automated:
      22. prune: true
      23. selfHeal: true
      24. syncOptions:
      25. - CreateNamespace=true
      1path is the branch that contains the siteconfig CRs for the clusters.
      2repoURL is the URL of the Git repository that contains the siteconfig custom resources that define site configuration for installing clusters.
      3targetRevision is the branch on the Git repository that contains the relevant site configuration data.
    3. Modify deployment/policies-app.yaml:

      1. apiVersion: v1
      2. kind: Namespace
      3. metadata:
      4. name: policies-sub
      5. ---
      6. apiVersion: argoproj.io/v1alpha1
      7. kind: Application
      8. metadata:
      9. name: policies
      10. namespace: openshift-gitops
      11. spec:
      12. destination:
      13. server: https://kubernetes.default.svc
      14. namespace: policies-sub
      15. project: default
      16. source:
      17. directory:
      18. recurse: true
      19. path: ztp/gitops-subscriptions/argocd/resource-hook-example/policygentemplates (1)
      20. repoURL: https://github.com/openshift-kni/cnf-features-deploy (2)
      21. targetRevision: master (3)
      22. syncPolicy:
      23. automated:
      24. prune: true
      25. selfHeal: true
      26. syncOptions:
      27. - CreateNamespace=true
      1path is the branch that contains the policygentemplates CRs for the clusters.
      2repoURL is the URL of the Git repository that contains the policygentemplates custom resources that specify configuration data for the site.
      3targetRevision is the branch on the Git repository that contains the relevant configuration data.
  4. To apply the pipeline configuration to your hub cluster, enter this command:

    1. oc apply -k ./deployment

Creating the site secrets

Add the required secrets for the site to the hub cluster. These resources must be in a namespace with a name that matches the cluster name.

Procedure

  1. Create a secret for authenticating to the site Baseboard Management Controller (BMC). Ensure the secret name matches the name used in the SiteConfig. In this example, the secret name is test-sno-bmh-secret:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: test-sno-bmh-secret
    5. namespace: test-sno
    6. data:
    7. password: dGVtcA==
    8. username: cm9vdA==
    9. type: Opaque
  2. Create the pull secret for the site. The pull secret must contain all credentials necessary for installing OpenShift and all add-on Operators. In this example, the secret name is assisted-deployment-pull-secret:

    1. apiVersion: v1
    2. kind: Secret
    3. metadata:
    4. name: assisted-deployment-pull-secret
    5. namespace: test-sno
    6. type: kubernetes.io/dockerconfigjson
    7. data:
    8. .dockerconfigjson: <Your pull secret base64 encoded>

The secrets are referenced from the SiteConfig custom resource (CR) by name. The namespace must match the SiteConfig namespace.

Creating the SiteConfig custom resources

ArgoCD acts as the engine for the GitOps method of site deployment. After completing a site plan that contains the required custom resources for the site installation, a policy generator creates the manifests and applies them to the hub cluster.

Procedure

  1. Create one or more SiteConfig custom resources, site-config.yaml files, that contains the site-plan data for the clusters. For example:

    1. apiVersion: ran.openshift.io/v1
    2. kind: SiteConfig
    3. metadata:
    4. name: "test-sno"
    5. namespace: "test-sno"
    6. spec:
    7. baseDomain: "clus2.t5g.lab.eng.bos.redhat.com"
    8. pullSecretRef:
    9. name: "assisted-deployment-pull-secret"
    10. clusterImageSetNameRef: "openshift-4.9"
    11. sshPublicKey: "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDB3dwhI5X0ZxGBb9VK7wclcPHLc8n7WAyKjTNInFjYNP9J+Zoc/ii+l3YbGUTuqilDwZN5rVIwBux2nUyVXDfaM5kPd9kACmxWtfEWTyVRootbrNWwRfKuC2h6cOd1IlcRBM1q6IzJ4d7+JVoltAxsabqLoCbK3svxaZoKAaK7jdGG030yvJzZaNM4PiTy39VQXXkCiMDmicxEBwZx1UsA8yWQsiOQ5brod9KQRXWAAST779gbvtgXR2L+MnVNROEHf1nEjZJwjwaHxoDQYHYKERxKRHlWFtmy5dNT6BbvOpJ2e5osDFPMEd41d2mUJTfxXiC1nvyjk9Irf8YJYnqJgBIxi0IxEllUKH7mTdKykHiPrDH5D2pRlp+Donl4n+sw6qoDc/3571O93+RQ6kUSAgAsvWiXrEfB/7kGgAa/BD5FeipkFrbSEpKPVu+gue1AQeJcz9BuLqdyPUQj2VUySkSg0FuGbG7fxkKeF1h3Sga7nuDOzRxck4I/8Z7FxMF/e8DmaBpgHAUIfxXnRqAImY9TyAZUEMT5ZPSvBRZNNmLbfex1n3NLcov/GEpQOqEYcjG5y57gJ60/av4oqjcVmgtaSOOAS0kZ3y9YDhjsaOcpmRYYijJn8URAH7NrW8EZsvAoF6GUt6xHq5T258c6xSYUm5L0iKvBqrOW9EjbLw== root@cnfdc2.clus2.t5g.lab.eng.bos.redhat.com"
    12. clusters:
    13. - clusterName: "test-sno"
    14. clusterType: "sno"
    15. clusterProfile: "du"
    16. clusterLabels:
    17. group-du-sno: ""
    18. common: true
    19. sites : "test-sno"
    20. clusterNetwork:
    21. - cidr: 1001:db9::/48
    22. hostPrefix: 64
    23. machineNetwork:
    24. - cidr: 2620:52:0:10e7::/64
    25. serviceNetwork:
    26. - 1001:db7::/112
    27. additionalNTPSources:
    28. - 2620:52:0:1310::1f6
    29. nodes:
    30. - hostName: "test-sno.clus2.t5g.lab.eng.bos.redhat.com"
    31. bmcAddress: "idrac-virtualmedia+https://[2620:52::10e7:f602:70ff:fee4:f4e2]/redfish/v1/Systems/System.Embedded.1"
    32. bmcCredentialsName:
    33. name: "test-sno-bmh-secret"
    34. bootMACAddress: "0C:42:A1:8A:74:EC"
    35. bootMode: "UEFI"
    36. rootDeviceHints:
    37. hctl: '0:1:0'
    38. cpuset: "0-1,52-53"
    39. nodeNetwork:
    40. interfaces:
    41. - name: eno1
    42. macAddress: "0C:42:A1:8A:74:EC"
    43. config:
    44. interfaces:
    45. - name: eno1
    46. type: ethernet
    47. state: up
    48. macAddress: "0C:42:A1:8A:74:EC"
    49. ipv4:
    50. enabled: false
    51. ipv6:
    52. enabled: true
    53. address:
    54. - ip: 2620:52::10e7:e42:a1ff:fe8a:900
    55. prefix-length: 64
    56. dns-resolver:
    57. config:
    58. search:
    59. - clus2.t5g.lab.eng.bos.redhat.com
    60. server:
    61. - 2620:52:0:1310::1f6
    62. routes:
    63. config:
    64. - destination: ::/0
    65. next-hop-interface: eno1
    66. next-hop-address: 2620:52:0:10e7::fc
    67. table-id: 254
  2. Save the files and push them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.

ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD synchronizes the PolicyGenTemplate to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster. The resource hooks convert the site definitions to installation custom resources and applies them to the hub cluster:

  • Namespace - Unique per site

  • AgentClusterInstall

  • BareMetalHost

  • ClusterDeployment

  • InfraEnv

  • NMStateConfig

  • ExtraManifestsConfigMap - Extra manifests. The additional manifests include workload partitioning, chronyd, mountpoint hiding, sctp enablement, and more.

  • ManagedCluster

  • KlusterletAddonConfig

Red Hat Advanced Cluster Management (RHACM) (ACM) deploys the hub cluster.

Creating the PolicyGenTemplates

Use the following procedure to create the PolicyGenTemplates you will need for generating policies in your Git repository for the hub cluster.

Procedure

  1. Create the PolicyGenTemplates and save them to the zero touch provisioning (ZTP) Git repository accessible from the hub cluster and defined as a source repository of the ArgoCD application.

  2. ArgoCD detects that the application is out of sync. Upon sync, either automatic or manual, ArgoCD applies the new PolicyGenTemplate to the hub cluster and launches the associated resource hooks. These hooks are responsible for generating the policy wrapped configuration CRs that apply to the spoke cluster and perform the following actions:

    1. Create the Red Hat Advanced Cluster Management (RHACM) (ACM) policies according to the basic distributed unit (DU) profile and required customizations.

    2. Apply the generated policies to the hub cluster.

The ZTP process creates policies that direct ACM to apply the desired configuration to the cluster nodes.

Checking the installation status

The ArgoCD pipeline detects the SiteConfig and PolicyGenTemplate custom resources (CRs) in the Git repository and syncs them to the hub cluster. In the process, it generates installation and policy CRs and applies them to the hub cluster. You can monitor the progress of this synchronization in the ArgoCD dashboard.

Procedure

  1. Monitor the progress of cluster installation using the following commands:

    1. $ export CLUSTER=<clusterName>
    1. $ oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.conditions[?(@.type=="Completed")]}' | jq
    1. $ curl -sk $(oc get agentclusterinstall -n $CLUSTER $CLUSTER -o jsonpath='{.status.debugInfo.eventsURL}') | jq '.[-2,-1]'
  2. Use the Red Hat Advanced Cluster Management (RHACM) (ACM) dashboard to monitor the progress of policy reconciliation.

Site cleanup

To remove a site and the associated installation and policy custom resources (CRs), remove the SiteConfig and site-specific PolicyGenTemplate CRs from the Git repository. The pipeline hooks remove the generated CRs.

Before removing a SiteConfig CR you must detach the cluster from ACM.

Removing the ArgoCD pipeline

Use the following procedure if you want to remove the ArgoCD pipeline and all generated artifacts.

Procedure

  1. Detach all clusters from ACM.

  2. Delete all SiteConfig and PolicyGenTemplate custom resources (CRs) from your Git repository.

  3. Delete the following namespaces:

    • All policy namespaces:

      1. $ oc get policy -A
    • clusters-sub

    • policies-sub

  4. Process the directory using the Kustomize tool:

    1. $ oc delete -k cnf-features-deploy/ztp/gitops-subscriptions/argocd/deployment

Troubleshooting GitOps ZTP

As noted, the ArgoCD pipeline synchronizes the SiteConfig and PolicyGenTemplate custom resources (CR) from the Git repository to the hub cluster. During this process, post-sync hooks create the installation and policy CRs that are also applied to the hub cluster. Use the following procedures to troubleshoot issues that might occur in this process.

Validating the generation of installation CRs

SiteConfig applies Installation custom resources (CR) to the hub cluster in a namespace with the name matching the site name. To check the status, enter the following command:

  1. $ oc get AgentClusterInstall -n <clusterName>

If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig to the installation CRs.

Procedure

  1. Check the synchronization of the SiteConfig to the hub cluster using either of the following commands:

    1. $ oc get siteconfig -A

    or

    1. $ oc get siteconfig -n clusters-sub

    If the SiteConfig is missing, one of the following situations has occurred:

    • The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:

      1. $ oc describe -n openshift-gitops application clusters

      Check for Status: Synced and that the Revision: is the SHA of the commit you pushed to the subscribed repository.

    • The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the clusters application.

  2. Verify the post hook job ran:

    1. $ oc describe job -n clusters-sub siteconfig-post
    • If successful, the returned output indicates succeeded: 1.

    • If the job fails, ArgoCD retries it. In some cases, the first pass will fail and the second pass will indicate that the job passed.

  3. Check for errors in the post hook job:

    1. $ oc get pod -n clusters-sub

    Note the name of the siteconfig-post-xxxxx pod:

    1. $ oc logs -n clusters-sub siteconfig-post-xxxxx

    If the logs indicate errors, correct the conditions and push the corrected SiteConfig or PolicyGenTemplate to the Git repository.

Validating the generation of policy CRs

ArgoCD generates the policy custom resources (CRs) in the same namespace as the PolicyGenTemplate from which they were created. The same troubleshooting flow applies to all policy CRs generated from PolicyGenTemplates regardless of whether they are common, group, or site based.

To check the status of the policy CRs, enter the following commands:

  1. $ export NS=<namespace>
  1. $ oc get policy -n $NS

The returned output displays the expected set of policy wrapped CRs. If no object is returned, use the following procedure to troubleshoot the ArgoCD pipeline flow from SiteConfig to the policy CRs.

Procedure

  1. Check the synchronization of the PolicyGenTemplate to the hub cluster:

    1. $ oc get policygentemplate -A

    or

    1. $ oc get policygentemplate -n $NS

    If the PolicyGenTemplate is not synchronized, one of the following situations has occurred:

    • The clusters application failed to synchronize the CR from the Git repository to the hub. Use the following command to verify this:

      1. $ oc describe -n openshift-gitops application clusters

      Check for Status: Synced and that the Revision: is the SHA of the commit you pushed to the subscribed repository.

    • The pre-sync hook failed, possibly due to a failure to pull the container image. Check the ArgoCD dashboard for the status of the pre-sync job in the clusters application.

  2. Ensure the policies were copied to the cluster namespace. When ACM recognizes that policies apply to a ManagedCluster, ACM applies the policy CR objects to the cluster namespace:

    1. $ oc get policy -n <clusterName>

    ACM copies all applicable common, group, and site policies here. The policy names are <policyNamespace> and <policyName>.

  3. Check the placement rule for any policies not copied to the cluster namespace. The matchSelector in the PlacementRule for those policies should match the labels on the ManagedCluster:

    1. $ oc get placementrule -n $NS
  4. Make a note of the PlacementRule name for the missing common, group, or site policy:

    1. oc get placementrule -n $NS <placmentRuleName> -o yaml
    • The status decisions value should include your cluster name.

    • The key value of the matchSelector in the spec should match the labels on your managed cluster. Check the labels on ManagedCluster:

      1. oc get ManagedCluster $CLUSTER -o jsonpath='{.metadata.labels}' | jq

      Example

      1. apiVersion: apps.open-cluster-management.io/v1
      2. kind: PlacementRule
      3. metadata:
      4. name: group-test1-policies-placementrules
      5. namespace: group-test1-policies
      6. spec:
      7. clusterSelector:
      8. matchExpressions:
      9. - key: group-test1
      10. operator: In
      11. values:
      12. - ""
      13. status:
      14. decisions:
      15. - clusterName: <myClusterName>
      16. clusterNamespace: <myClusterName>
  5. Ensure all policies are compliant:

    1. oc get policy -n $CLUSTER

    If the Namespace, OperatorGroup, and Subscription policies are compliant but the Operator configuration policies are not it is likely that the Operators did not install.