Disaster recovery for a hosted cluster within an AWS region

In a situation where you need disaster recovery (DR) for a hosted cluster, you can recover a hosted cluster to the same region within AWS. For example, you need DR when the upgrade of a management cluster fails and the hosted cluster is in a read-only state.

Hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.

The DR process involves three main steps:

  1. Backing up the hosted cluster on the source management cluster

  2. Restoring the hosted cluster on a destination management cluster

  3. Deleting the hosted cluster from the source management cluster

Your workloads remain running during the process. The Cluster API might be unavailable for a period, but that will not affect the services that are running on the worker nodes.

Both the source management cluster and the destination management cluster must have the —external-dns flags to maintain the API server URL, as shown in this example:

Example: External DNS flags
  1. external-dns-provider=aws \
  2. external-dns-credentials=<AWS Credentials location> \
  3. external-dns-domain-filter=<DNS Base Domain>

If you do not include the —external-dns flags to maintain the API server URL, the hosted cluster cannot be migrated.

Example environment and context

Consider an scenario where you have three clusters to restore. Two are management clusters, and one is a hosted cluster. You can restore either the control plane only or the control plane and the nodes. Before you begin, you need the following information:

  • Source MGMT Namespace: The source management namespace

  • Source MGMT ClusterName: The source management cluster name

  • Source MGMT Kubeconfig: The source management kubeconfig file

  • Destination MGMT Kubeconfig: The destination management kubeconfig file

  • HC Kubeconfig: The hosted cluster kubeconfig file

  • SSH key file: The SSH public key

  • Pull secret: The pull secret file to access the release images

  • AWS credentials

  • AWS region

  • Base domain: The DNS base domain to use as an external DNS

  • S3 bucket name: The bucket in the AWS region where you plan to upload the etcd backup

This information is shown in the following example environment variables.

Example environment variables

  1. SSH_KEY_FILE=${HOME}/.ssh/id_rsa.pub
  2. BASE_PATH=${HOME}/hypershift
  3. BASE_DOMAIN="aws.sample.com"
  4. PULL_SECRET_FILE="${HOME}/pull_secret.json"
  5. AWS_CREDS="${HOME}/.aws/credentials"
  6. AWS_ZONE_ID="Z02718293M33QHDEQBROL"
  7. CONTROL_PLANE_AVAILABILITY_POLICY=SingleReplica
  8. HYPERSHIFT_PATH=${BASE_PATH}/src/hypershift
  9. HYPERSHIFT_CLI=${HYPERSHIFT_PATH}/bin/hypershift
  10. HYPERSHIFT_IMAGE=${HYPERSHIFT_IMAGE:-"quay.io/${USER}/hypershift:latest"}
  11. NODE_POOL_REPLICAS=${NODE_POOL_REPLICAS:-2}
  12. # MGMT Context
  13. MGMT_REGION=us-west-1
  14. MGMT_CLUSTER_NAME="${USER}-dev"
  15. MGMT_CLUSTER_NS=${USER}
  16. MGMT_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT_CLUSTER_NS}-${MGMT_CLUSTER_NAME}"
  17. MGMT_KUBECONFIG="${MGMT_CLUSTER_DIR}/kubeconfig"
  18. # MGMT2 Context
  19. MGMT2_CLUSTER_NAME="${USER}-dest"
  20. MGMT2_CLUSTER_NS=${USER}
  21. MGMT2_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT2_CLUSTER_NS}-${MGMT2_CLUSTER_NAME}"
  22. MGMT2_KUBECONFIG="${MGMT2_CLUSTER_DIR}/kubeconfig"
  23. # Hosted Cluster Context
  24. HC_CLUSTER_NS=clusters
  25. HC_REGION=us-west-1
  26. HC_CLUSTER_NAME="${USER}-hosted"
  27. HC_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}"
  28. HC_KUBECONFIG="${HC_CLUSTER_DIR}/kubeconfig"
  29. BACKUP_DIR=${HC_CLUSTER_DIR}/backup
  30. BUCKET_NAME="${USER}-hosted-${MGMT_REGION}"
  31. # DNS
  32. AWS_ZONE_ID="Z07342811SH9AA102K1AC"
  33. EXTERNAL_DNS_DOMAIN="hc.jpdv.aws.kerbeross.com"

Overview of the backup and restore process

The backup and restore process works as follows:

  1. On management cluster 1, which you can think of as the source management cluster, the control plane and workers interact by using the ExternalDNS API.

  2. You take a snapshot of the hosted cluster, which includes etcd, the control plane, and the worker nodes. The worker nodes are moved to the external DNS, the control plane is saved in a local manifest file, and etcd is backed up to an S3 bucket.

  3. On management cluster 2, which you can think of as the destination management cluster, you restore etcd from the S3 bucket and restore the control plane from the local manifest file.

  4. By using the External DNS API, the worker nodes are restored to management cluster 2.

  5. On management cluster 2, the control plane and worker nodes interact by using the ExternalDNS API.

You can manually back up and restore your hosted cluster, or you can run a script to complete the process. For more information about the script, see “Running a script to back up and restore a hosted cluster”.

Backing up a hosted cluster

To recover your hosted cluster in your target management cluster, you first need to back up all of the relevant data.

Procedure

  1. Create a configmap file to declare the source management cluster by entering this command:

    1. $ oc create configmap mgmt-parent-cluster -n default --from-literal=from=${MGMT_CLUSTER_NAME}
  2. Shut down the reconciliation in the hosted cluster and in the node pools by entering these commands:

    1. PAUSED_UNTIL="true"
    2. oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
    3. oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator
    1. PAUSED_UNTIL="true"
    2. oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
    3. oc patch -n ${HC_CLUSTER_NS} nodepools/${NODEPOOLS} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
    4. oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator
  3. Back up etcd and upload the data to an S3 bucket by running this bash script:

    Wrap this script in a function and call it from the main function.

    1. # ETCD Backup
    2. ETCD_PODS="etcd-0"
    3. if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
    4. ETCD_PODS="etcd-0 etcd-1 etcd-2"
    5. fi
    6. for POD in ${ETCD_PODS}; do
    7. # Create an etcd snapshot
    8. oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/client/etcd-client-ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db
    9. oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db
    10. FILEPATH="/${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
    11. CONTENT_TYPE="application/x-compressed-tar"
    12. DATE_VALUE=`date -R`
    13. SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"
    14. set +x
    15. ACCESS_KEY=$(grep aws_access_key_id ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
    16. SECRET_KEY=$(grep aws_secret_access_key ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
    17. SIGNATURE_HASH=$(echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac "${SECRET_KEY}" -binary | base64)
    18. set -x
    19. # FIXME: this is pushing to the OIDC bucket
    20. oc exec -it etcd-0 -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
    21. -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
    22. -H "Date: ${DATE_VALUE}" \
    23. -H "Content-Type: ${CONTENT_TYPE}" \
    24. -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
    25. https://${BUCKET_NAME}.s3.amazonaws.com/${HC_CLUSTER_NAME}-${POD}-snapshot.db
    26. done

    For more information about backing up etcd, see “Backing up and restoring etcd on a hosted cluster”.

  4. Back up Kubernetes and OKD objects by entering the following commands. You need to back up the following objects:

    • HostedCluster and NodePool objects from the HostedCluster namespace

    • HostedCluster secrets from the HostedCluster namespace

    • HostedControlPlane from the Hosted Control Plane namespace

    • Cluster from the Hosted Control Plane namespace

    • AWSCluster, AWSMachineTemplate, and AWSMachine from the Hosted Control Plane namespace

    • MachineDeployments, MachineSets, and Machines from the Hosted Control Plane namespace

    • ControlPlane secrets from the Hosted Control Plane namespace

      1. mkdir -p ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS} ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
      2. chmod 700 ${BACKUP_DIR}/namespaces/
      3. # HostedCluster
      4. echo "Backing Up HostedCluster Objects:"
      5. oc get hc ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml
      6. echo "--> HostedCluster"
      7. sed -i '' -e '/^status:$/,$d' ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml
      8. # NodePool
      9. oc get np ${NODEPOOLS} -n ${HC_CLUSTER_NS} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml
      10. echo "--> NodePool"
      11. sed -i '' -e '/^status:$/,$ d' ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml
      12. # Secrets in the HC Namespace
      13. echo "--> HostedCluster Secrets:"
      14. for s in $(oc get secret -n ${HC_CLUSTER_NS} | grep "^${HC_CLUSTER_NAME}" | awk '{print $1}'); do
      15. oc get secret -n ${HC_CLUSTER_NS} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-${s}.yaml
      16. done
      17. # Secrets in the HC Control Plane Namespace
      18. echo "--> HostedCluster ControlPlane Secrets:"
      19. for s in $(oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} | egrep -v "docker|service-account-token|oauth-openshift|NAME|token-${HC_CLUSTER_NAME}" | awk '{print $1}'); do
      20. oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-${s}.yaml
      21. done
      22. # Hosted Control Plane
      23. echo "--> HostedControlPlane:"
      24. oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-${HC_CLUSTER_NAME}.yaml
      25. # Cluster
      26. echo "--> Cluster:"
      27. CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})
      28. oc get cluster ${CL_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-${HC_CLUSTER_NAME}.yaml
      29. # AWS Cluster
      30. echo "--> AWS Cluster:"
      31. oc get awscluster ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-${HC_CLUSTER_NAME}.yaml
      32. # AWS MachineTemplate
      33. echo "--> AWS Machine Template:"
      34. oc get awsmachinetemplate ${NODEPOOLS} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-${HC_CLUSTER_NAME}.yaml
      35. # AWS Machines
      36. echo "--> AWS Machine:"
      37. CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})
      38. for s in $(oc get awsmachines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --no-headers | grep ${CL_NAME} | cut -f1 -d\ ); do
      39. oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} awsmachines $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-${s}.yaml
      40. done
      41. # MachineDeployments
      42. echo "--> HostedCluster MachineDeployments:"
      43. for s in $(oc get machinedeployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
      44. mdp_name=$(echo ${s} | cut -f 2 -d /)
      45. oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-${mdp_name}.yaml
      46. done
      47. # MachineSets
      48. echo "--> HostedCluster MachineSets:"
      49. for s in $(oc get machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
      50. ms_name=$(echo ${s} | cut -f 2 -d /)
      51. oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-${ms_name}.yaml
      52. done
      53. # Machines
      54. echo "--> HostedCluster Machine:"
      55. for s in $(oc get machine -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
      56. m_name=$(echo ${s} | cut -f 2 -d /)
      57. oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-${m_name}.yaml
      58. done
  5. Clean up the ControlPlane routes by entering this command:

    1. $ oc delete routes -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all

    By entering that command, you enable the ExternalDNS Operator to delete the Route53 entries.

  6. Verify that the Route53 entries are clean by running this script:

    1. function clean_routes() {
    2. if [[ -z "${1}" ]];then
    3. echo "Give me the NS where to clean the routes"
    4. exit 1
    5. fi
    6. # Constants
    7. if [[ -z "${2}" ]];then
    8. echo "Give me the Route53 zone ID"
    9. exit 1
    10. fi
    11. ZONE_ID=${2}
    12. ROUTES=10
    13. timeout=40
    14. count=0
    15. # This allows us to remove the ownership in the AWS for the API route
    16. oc delete route -n ${1} --all
    17. while [ ${ROUTES} -gt 2 ]
    18. do
    19. echo "Waiting for ExternalDNS Operator to clean the DNS Records in AWS Route53 where the zone id is: ${ZONE_ID}..."
    20. echo "Try: (${count}/${timeout})"
    21. sleep 10
    22. if [[ $count -eq timeout ]];then
    23. echo "Timeout waiting for cleaning the Route53 DNS records"
    24. exit 1
    25. fi
    26. count=$((count+1))
    27. ROUTES=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} --max-items 10000 --output json | grep -c ${EXTERNAL_DNS_DOMAIN})
    28. done
    29. }
    30. # SAMPLE: clean_routes "<HC ControlPlane Namespace>" "<AWS_ZONE_ID>"
    31. clean_routes "${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}" "${AWS_ZONE_ID}"

Verification

Check all of the OKD objects and the S3 bucket to verify that everything looks as expected.

Next steps

Restore your hosted cluster.

Restoring a hosted cluster

Gather all of the objects that you backed up and restore them in your destination management cluster.

Prerequisites

You backed up the data from your source management cluster.

Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT2_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT2_KUBECONFIG}.

Procedure

  1. Verify that the new management cluster does not contain any namespaces from the cluster that you are restoring by entering these commands:

    1. # Just in case
    2. export KUBECONFIG=${MGMT2_KUBECONFIG}
    3. BACKUP_DIR=${HC_CLUSTER_DIR}/backup
    4. # Namespace deletion in the destination Management cluster
    5. $ oc delete ns ${HC_CLUSTER_NS} || true
    6. $ oc delete ns ${HC_CLUSTER_NS}-{HC_CLUSTER_NAME} || true
  2. Re-create the deleted namespaces by entering these commands:

    1. # Namespace creation
    2. $ oc new-project ${HC_CLUSTER_NS}
    3. $ oc new-project ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
  3. Restore the secrets in the HC namespace by entering this command:

    1. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-*
  4. Restore the objects in the HostedCluster control plane namespace by entering these commands:

    1. # Secrets
    2. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-*
    3. # Cluster
    4. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-*
    5. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-*
  5. If you are recovering the nodes and the node pool to reuse AWS instances, restore the objects in the HC control plane namespace by entering these commands:

    1. # AWS
    2. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-*
    3. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-*
    4. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-*
    5. # Machines
    6. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-*
    7. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-*
    8. $ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-*
  6. Restore the etcd data and the hosted cluster by running this bash script:

    1. ETCD_PODS="etcd-0"
    2. if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
    3. ETCD_PODS="etcd-0 etcd-1 etcd-2"
    4. fi
    5. HC_RESTORE_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-restore.yaml
    6. HC_BACKUP_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml
    7. HC_NEW_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-new.yaml
    8. cat ${HC_BACKUP_FILE} > ${HC_NEW_FILE}
    9. cat > ${HC_RESTORE_FILE} <<EOF
    10. restoreSnapshotURL:
    11. EOF
    12. for POD in ${ETCD_PODS}; do
    13. # Create a pre-signed URL for the etcd snapshot
    14. ETCD_SNAPSHOT="s3://${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
    15. ETCD_SNAPSHOT_URL=$(AWS_DEFAULT_REGION=${MGMT2_REGION} aws s3 presign ${ETCD_SNAPSHOT})
    16. # FIXME no CLI support for restoreSnapshotURL yet
    17. cat >> ${HC_RESTORE_FILE} <<EOF
    18. - "${ETCD_SNAPSHOT_URL}"
    19. EOF
    20. done
    21. cat ${HC_RESTORE_FILE}
    22. if ! grep ${HC_CLUSTER_NAME}-snapshot.db ${HC_NEW_FILE}; then
    23. sed -i '' -e "/type: PersistentVolume/r ${HC_RESTORE_FILE}" ${HC_NEW_FILE}
    24. sed -i '' -e '/pausedUntil:/d' ${HC_NEW_FILE}
    25. fi
    26. HC=$(oc get hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} -o name || true)
    27. if [[ ${HC} == "" ]];then
    28. echo "Deploying HC Cluster: ${HC_CLUSTER_NAME} in ${HC_CLUSTER_NS} namespace"
    29. oc apply -f ${HC_NEW_FILE}
    30. else
    31. echo "HC Cluster ${HC_CLUSTER_NAME} already exists, avoiding step"
    32. fi
  7. If you are recovering the nodes and the node pool to reuse AWS instances, restore the node pool by entering this command:

    1. oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-*

Verification

  • To verify that the nodes are fully restored, use this function:

    1. timeout=40
    2. count=0
    3. NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0
    4. while [ ${NODE_POOL_REPLICAS} != ${NODE_STATUS} ]
    5. do
    6. echo "Waiting for Nodes to be Ready in the destination MGMT Cluster: ${MGMT2_CLUSTER_NAME}"
    7. echo "Try: (${count}/${timeout})"
    8. sleep 30
    9. if [[ $count -eq timeout ]];then
    10. echo "Timeout waiting for Nodes in the destination MGMT Cluster"
    11. exit 1
    12. fi
    13. count=$((count+1))
    14. NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0
    15. done

Next steps

Shut down and delete your cluster.

Deleting a hosted cluster from your source management cluster

After you back up your hosted cluster and restore it to your destination management cluster, you shut down and delete the hosted cluster on your source management cluster.

Prerequisites

You backed up your data and restored it to your source management cluster.

Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT_KUBECONFIG}.

Procedure

  1. Scale the deployment and statefulset objects by entering these commands:

    1. # Just in case
    2. export KUBECONFIG=${MGMT_KUBECONFIG}
    3. # Scale down deployments
    4. oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
    5. oc scale statefulset.apps -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
    6. sleep 15
  2. Delete the NodePool objects by entering these commands:

    1. NODEPOOLS=$(oc get nodepools -n ${HC_CLUSTER_NS} -o=jsonpath='{.items[?(@.spec.clusterName=="'${HC_CLUSTER_NAME}'")].metadata.name}')
    2. if [[ ! -z "${NODEPOOLS}" ]];then
    3. oc patch -n "${HC_CLUSTER_NS}" nodepool ${NODEPOOLS} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'
    4. oc delete np -n ${HC_CLUSTER_NS} ${NODEPOOLS}
    5. fi
  3. Delete the machine and machineset objects by entering these commands:

    1. # Machines
    2. for m in $(oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    3. oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    4. oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
    5. done
    6. oc delete machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all || true
  4. Delete the cluster object by entering these commands:

    1. # Cluster
    2. C_NAME=$(oc get cluster -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)
    3. oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${C_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'
    4. oc delete cluster.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all
  5. Delete the AWS machines (Kubernetes objects) by entering these commands. Do not worry about deleting the real AWS machines. The cloud instances will not be affected.

    1. # AWS Machines
    2. for m in $(oc get awsmachine.infrastructure.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)
    3. do
    4. oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    5. oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
    6. done
  6. Delete the HostedControlPlane and ControlPlane HC namespace objects by entering these commands:

    1. # Delete HCP and ControlPlane HC NS
    2. oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} hostedcontrolplane.hypershift.openshift.io ${HC_CLUSTER_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'
    3. oc delete hostedcontrolplane.hypershift.openshift.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all
    4. oc delete ns ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} || true
  7. Delete the HostedCluster and HC namespace objects by entering these commands:

    1. # Delete HC and HC Namespace
    2. oc -n ${HC_CLUSTER_NS} patch hostedclusters ${HC_CLUSTER_NAME} -p '{"metadata":{"finalizers":null}}' --type merge || true
    3. oc delete hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} || true
    4. oc delete ns ${HC_CLUSTER_NS} || true

Verification

  • To verify that everything works, enter these commands:

    1. # Validations
    2. export KUBECONFIG=${MGMT2_KUBECONFIG}
    3. oc get hc -n ${HC_CLUSTER_NS}
    4. oc get np -n ${HC_CLUSTER_NS}
    5. oc get pod -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
    6. oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
    7. # Inside the HostedCluster
    8. export KUBECONFIG=${HC_KUBECONFIG}
    9. oc get clusterversion
    10. oc get nodes

Next steps

Delete the OVN pods in the hosted cluster so that you can connect to the new OVN control plane that runs in the new management cluster:

  1. Load the KUBECONFIG environment variable with the hosted cluster’s kubeconfig path.

  2. Enter this command:

    1. $ oc delete pod -n openshift-ovn-kubernetes --all

Running a script to back up and restore a hosted cluster

To expedite the process to back up a hosted cluster and restore it within the same region on AWS, you can modify and run a script.

Procedure

  1. Replace the variables in the following script with your information:

    1. # Fill the Common variables to fit your environment, this is just a sample
    2. SSH_KEY_FILE=${HOME}/.ssh/id_rsa.pub
    3. BASE_PATH=${HOME}/hypershift
    4. BASE_DOMAIN="aws.sample.com"
    5. PULL_SECRET_FILE="${HOME}/pull_secret.json"
    6. AWS_CREDS="${HOME}/.aws/credentials"
    7. CONTROL_PLANE_AVAILABILITY_POLICY=SingleReplica
    8. HYPERSHIFT_PATH=${BASE_PATH}/src/hypershift
    9. HYPERSHIFT_CLI=${HYPERSHIFT_PATH}/bin/hypershift
    10. HYPERSHIFT_IMAGE=${HYPERSHIFT_IMAGE:-"quay.io/${USER}/hypershift:latest"}
    11. NODE_POOL_REPLICAS=${NODE_POOL_REPLICAS:-2}
    12. # MGMT Context
    13. MGMT_REGION=us-west-1
    14. MGMT_CLUSTER_NAME="${USER}-dev"
    15. MGMT_CLUSTER_NS=${USER}
    16. MGMT_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT_CLUSTER_NS}-${MGMT_CLUSTER_NAME}"
    17. MGMT_KUBECONFIG="${MGMT_CLUSTER_DIR}/kubeconfig"
    18. # MGMT2 Context
    19. MGMT2_CLUSTER_NAME="${USER}-dest"
    20. MGMT2_CLUSTER_NS=${USER}
    21. MGMT2_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT2_CLUSTER_NS}-${MGMT2_CLUSTER_NAME}"
    22. MGMT2_KUBECONFIG="${MGMT2_CLUSTER_DIR}/kubeconfig"
    23. # Hosted Cluster Context
    24. HC_CLUSTER_NS=clusters
    25. HC_REGION=us-west-1
    26. HC_CLUSTER_NAME="${USER}-hosted"
    27. HC_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}"
    28. HC_KUBECONFIG="${HC_CLUSTER_DIR}/kubeconfig"
    29. BACKUP_DIR=${HC_CLUSTER_DIR}/backup
    30. BUCKET_NAME="${USER}-hosted-${MGMT_REGION}"
    31. # DNS
    32. AWS_ZONE_ID="Z026552815SS3YPH9H6MG"
    33. EXTERNAL_DNS_DOMAIN="guest.jpdv.aws.kerbeross.com"
  2. Save the script to your local file system.

  3. Run the script by entering the following command:

    1. source <env_file>

    where: env_file is the name of the file where you saved the script.