Installation on OpenShift OKD

OpenShift Requirements

  1. Choose preferred cloud provider. This guide was tested in AWS, Azure, and GCP from a Linux host.
  2. Read OpenShift documentation to find out about provider-specific prerequisites.
  3. Get OpenShift Installer.

Note

It is highly recommended to read the OpenShift documentation, unless you have installed OpenShift in the past. Here are a few notes that you may find useful.

  • With the AWS provider openshift-install will not work properly when MFA credentials are stored in ~/.aws/credentials, traditional credentials are required.
  • With the Azure provider openshift-install will prompt for credentials and store them in ~/.azure/osServicePrincipal.json, it doesn’t simply pickup az login credentials. It’s recommended to setup a dedicated service principal and use it.
  • With the GCP provider openshift-install will only work with a service account key, which has to be set using GOOGLE_CREDENTIALS environment variable (e.g. GOOGLE_CREDENTIALS=service-account.json). Follow Openshift Installer documentation to assign required roles to your service account.

Create an OpenShift OKD Cluster

First, set the cluster name:

  1. CLUSTER_NAME="cluster-1"

Now, create configuration files:

Note

The sample output below is showing the AWS provider, but it should work the same way with other providers.

  1. $ openshift-install create install-config --dir "${CLUSTER_NAME}"
  2. ? SSH Public Key ~/.ssh/id_rsa.pub
  3. ? Platform aws
  4. INFO Credentials loaded from default AWS environment variables
  5. ? Region eu-west-1
  6. ? Base Domain openshift-test-1.cilium.rocks
  7. ? Cluster Name cluster-1
  8. ? Pull Secret [? for help] **********************************

And set networkType: Cilium:

  1. sed -i "s/networkType: .*/networkType: Cilium/" "${CLUSTER_NAME}/install-config.yaml"

The resulting configuration will look like this:

  1. apiVersion: v1
  2. baseDomain: ilya-openshift-test-1.cilium.rocks
  3. compute:
  4. - architecture: amd64
  5. hyperthreading: Enabled
  6. name: worker
  7. platform: {}
  8. replicas: 3
  9. controlPlane:
  10. architecture: amd64
  11. hyperthreading: Enabled
  12. name: master
  13. platform: {}
  14. replicas: 3
  15. metadata:
  16. creationTimestamp: null
  17. name: cluster-1
  18. networking:
  19. clusterNetwork:
  20. - cidr: 10.128.0.0/14
  21. hostPrefix: 23
  22. machineNetwork:
  23. - cidr: 10.0.0.0/16
  24. networkType: Cilium
  25. serviceNetwork:
  26. - 172.30.0.0/16
  27. platform:
  28. aws:
  29. region: eu-west-1
  30. publish: External
  31. pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
  32. sshKey: |
  33. ssh-rsa <REDACTED>

You may wish to make a few changes, e.g. increase the number of nodes.

If you do change any of the CIDRs, you will need to make sure that Helm values in ${CLUSTER_NAME}/manifests/cluster-network-07-cilium-ciliumconfig.yaml reflect those changes. Namely clusterNetwork should match nativeRoutingCIDR, clusterPoolIPv4PodCIDR and clusterPoolIPv4MaskSize. Also make sure that the clusterNetwork does not conflict with machineNetwork (which represents the VPC CIDR in AWS).

Warning

Ensure that there are multiple replicas of the controlPlane. A single controlPlane will lead to failure to bootstrap the cluster during installation.

Next, generate OpenShift manifests:

  1. openshift-install create manifests --dir "${CLUSTER_NAME}"

Next, obtain Cilium manifest from cilium/cilium-olm repository and copy to ${CLUSTER_NAME}/manifests:

  1. cilium_olm_rev="master"
  2. cilium_version="\ |release|\ "
  3. curl --silent --location --fail --show-error "https://github.com/cilium/cilium-olm/archive/${cilium_olm_rev}.tar.gz" --output /tmp/cilium-olm.tgz
  4. tar -C /tmp -xf /tmp/cilium-olm.tgz
  5. cp /tmp/cilium-olm-${cilium_olm_rev}/manifests/cilium.v${cilium_version}/* "${CLUSTER_NAME}/manifests"
  6. rm -rf -- /tmp/cilium-olm.tgz "/tmp/cilium-olm-${cilium_olm_rev}"

At this stage manifest directory contains all that is needed to install Cilium. To get a list of the Cilium manifests, run:

  1. ls ${CLUSTER_NAME}/manifests/cluster-network-*-cilium-*

You can set any custom Helm values by editing ${CLUSTER_NAME}/manifests/cluster-network-07-cilium-ciliumconfig.yaml.

It is also possible to update Helm values once the cluster is running by changing the CiliumConfig object, e.g. with kubectl edit ciliumconfig -n cilium cilium. You may need to restart the Cilium agent pods for certain options to take effect.

Note

If you are not using a real OpenShift pull secret, you will not be able to install the Cilium OLM operator using RedHat registry. You can fix this by running:

  1. sed -i 's|image:\ registry.connect.redhat.com/isovalent/|image:\ quay.io/cilium/|g' \
  2. "${CLUSTER_NAME}/manifests/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml" \
  3. ${CLUSTER_NAME}/manifests/cluster-network-06-cilium-00014-cilium.*-clusterserviceversion.yaml

Create the cluster:

Note

The sample output below is showing the AWS provider, but it should work the same way with other providers.

  1. $ openshift-install create cluster --dir "${CLUSTER_NAME}"
  2. INFO Consuming OpenShift Install (Manifests) from target directory
  3. INFO Consuming Master Machines from target directory
  4. INFO Consuming Worker Machines from target directory
  5. INFO Consuming Openshift Manifests from target directory
  6. INFO Consuming Common Manifests from target directory
  7. INFO Credentials loaded from the "default" profile in file "/home/twp/.aws/credentials"
  8. INFO Creating infrastructure resources...
  9. INFO Waiting up to 20m0s for the Kubernetes API at https://api.cluster-name.ilya-openshift-test-1.cilium.rocks:6443...
  10. INFO API v1.20.0-1058+7d0a2b269a2741-dirty up
  11. INFO Waiting up to 30m0s for bootstrapping to complete...
  12. INFO Destroying the bootstrap resources...
  13. INFO Waiting up to 40m0s for the cluster at https://api.cluster-name.ilya-openshift-test-1.cilium.rocks:6443 to initialize...
  14. INFO Waiting up to 10m0s for the openshift-console route to be created...
  15. INFO Install complete!
  16. INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/twp/okd/cluster-name/auth/kubeconfig'
  17. INFO Access the OpenShift web-console here: https://console-openshift-console.apps.cluster-name.ilya-openshift-test-1.cilium.rocks
  18. INFO Login to the console with user: "kubeadmin", and password: "<REDACTED>"
  19. INFO Time elapsed: 32m9s

Next, the firewall configuration must be updated to allow Cilium ports Firewall Rules. openshift-install does not support custom firewall rules, so you will need to use one of the following scripts if you are using AWS or GCP. Azure does not need additional configuration.

Warning

You need to execute the following command to configure firewall rules just after INFO Waiting up to 40m0s for bootstrapping to complete... appears in the logs, or the installation will fail. It is safe to apply these changes once, OpenShift will not override these.

AWS: enable Cilium ports

GCP: enable Cilium ports

Azure: enable Cilium ports

This script depends on jq and the AWS CLI (aws). Make sure to run it inside of the same working directory where ${CLUSTER_NAME} directory is present.

  1. infraID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.infraID')"
  2. aws_region="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.aws.region')"
  3. cluster_tag="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.aws.identifier[0] | to_entries | "Name=tag:\(.[0].key),Values=\(.[0].value)"')"
  4. worker_sg="$(aws ec2 describe-security-groups --region "${aws_region}" --filters "${cluster_tag}" "Name=tag:Name,Values=${infraID}-worker-sg" | jq -r '.SecurityGroups[0].GroupId')"
  5. master_sg="$(aws ec2 describe-security-groups --region "${aws_region}" --filters "${cluster_tag}" "Name=tag:Name,Values=${infraID}-master-sg" | jq -r '.SecurityGroups[0].GroupId')"
  6. aws ec2 authorize-security-group-ingress --region "${aws_region}" \
  7. --ip-permissions \
  8. "IpProtocol=udp,FromPort=8472,ToPort=8472,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
  9. "IpProtocol=tcp,FromPort=4240,ToPort=4240,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
  10. --group-id "${worker_sg}"
  11. aws ec2 authorize-security-group-ingress --region "${aws_region}" \
  12. --ip-permissions \
  13. "IpProtocol=udp,FromPort=8472,ToPort=8472,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
  14. "IpProtocol=tcp,FromPort=4240,ToPort=4240,UserIdGroupPairs=[{GroupId=${worker_sg}},{GroupId=${master_sg}}]" \
  15. --group-id "${master_sg}"

This script depends on jq and the Google Cloud SDK (gcloud). Make sure to run it inside of the same working directory where ${CLUSTER_NAME} directory is present.

  1. infraID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.infraID')"
  2. gcp_projectID="$(jq -r < "${CLUSTER_NAME}/metadata.json" '.gcp.projectID')"
  3. gcloud compute firewall-rules create \
  4. --project="${gcp_projectID}" \
  5. --network="${infraID}-network" \
  6. --allow=tcp:4240,udp:8472,icmp \
  7. --source-tags="${infraID}-worker,${infraID}-master" \
  8. --target-tags="${infraID}-worker,${infraID}-master" \
  9. "${infraID}-cilium"

No additional configuration is needed.

Accessing the cluster

To access the cluster you will need to use kubeconfig file from the ${CLUSTER_NAME}/auth directory:

  1. export KUBECONFIG="${CLUSTER_NAME}/auth/kubeconfig"

Prepare cluster for Cilium connectivity test

In order for Cilium connectivity test pods to run on OpenShift, a simple custom SecurityContextConstraints object is required. It will to allow hostPort/hostNetwork that some of the connectivity test pods rely on, it sets only allowHostPorts and allowHostNetwork without any other privileges.

  1. kubectl apply -f - <<EOF
  2. apiVersion: security.openshift.io/v1
  3. kind: SecurityContextConstraints
  4. metadata:
  5. name: cilium-test
  6. allowHostPorts: true
  7. allowHostNetwork: true
  8. users:
  9. - system:serviceaccount:cilium-test:default
  10. priority: null
  11. readOnlyRootFilesystem: false
  12. runAsUser:
  13. type: MustRunAsRange
  14. seLinuxContext:
  15. type: MustRunAs
  16. volumes: null
  17. allowHostDirVolumePlugin: false
  18. allowHostIPC: false
  19. allowHostPID: false
  20. allowPrivilegeEscalation: false
  21. allowPrivilegedContainer: false
  22. allowedCapabilities: null
  23. defaultAddCapabilities: null
  24. requiredDropCapabilities: null
  25. groups: null
  26. EOF

Deploy the connectivity test

You can deploy the “connectivity-check” to test connectivity between pods. It is recommended to create a separate namespace for this.

  1. kubectl create ns cilium-test

Deploy the check with:

  1. kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.10/examples/kubernetes/connectivity-check/connectivity-check.yaml

It will deploy a series of deployments which will use various connectivity paths to connect to each other. Connectivity paths include with and without service load-balancing and various network policy combinations. The pod name indicates the connectivity variant and the readiness and liveness gate indicates success or failure of the test:

  1. $ kubectl get pods -n cilium-test
  2. NAME READY STATUS RESTARTS AGE
  3. echo-a-76c5d9bd76-q8d99 1/1 Running 0 66s
  4. echo-b-795c4b4f76-9wrrx 1/1 Running 0 66s
  5. echo-b-host-6b7fc94b7c-xtsff 1/1 Running 0 66s
  6. host-to-b-multi-node-clusterip-85476cd779-bpg4b 1/1 Running 0 66s
  7. host-to-b-multi-node-headless-dc6c44cb5-8jdz8 1/1 Running 0 65s
  8. pod-to-a-79546bc469-rl2qq 1/1 Running 0 66s
  9. pod-to-a-allowed-cnp-58b7f7fb8f-lkq7p 1/1 Running 0 66s
  10. pod-to-a-denied-cnp-6967cb6f7f-7h9fn 1/1 Running 0 66s
  11. pod-to-b-intra-node-nodeport-9b487cf89-6ptrt 1/1 Running 0 65s
  12. pod-to-b-multi-node-clusterip-7db5dfdcf7-jkjpw 1/1 Running 0 66s
  13. pod-to-b-multi-node-headless-7d44b85d69-mtscc 1/1 Running 0 66s
  14. pod-to-b-multi-node-nodeport-7ffc76db7c-rrw82 1/1 Running 0 65s
  15. pod-to-external-1111-d56f47579-d79dz 1/1 Running 0 66s
  16. pod-to-external-fqdn-allow-google-cnp-78986f4bcf-btjn7 1/1 Running 0 66s

Note

If you deploy the connectivity check to a single node cluster, pods that check multi-node functionalities will remain in the Pending state. This is expected since these pods need at least 2 nodes to be scheduled successfully.

Once done with the test, remove the cilium-test namespace:

  1. kubectl delete ns cilium-test

Cleanup after connectivity test

Remove the SecurityContextConstraints:

  1. kubectl delete scc cilium-test

Delete the cluster

  1. openshift-install destroy cluster --dir="${CLUSTER_NAME}"