Troubleshooting

This section describes resources for troubleshooting the Migration Toolkit for Containers (MTC).

Logs and debugging tools

This section describes logs and debugging tools that you can use for troubleshooting.

Viewing migration plan resources

You can view migration plan resources to monitor a running migration or to troubleshoot a failed migration by using the MTC web console and the command line interface (CLI).

Procedure

  1. In the MTC web console, click Migration Plans.

  2. Click the Migrations number next to a migration plan to view the Migrations page.

  3. Click a migration to view the Migration details.

  4. Expand Migration resources to view the migration resources and their status in a tree view.

    To troubleshoot a failed migration, start with a high-level resource that has failed and then work down the resource tree towards the lower-level resources.

  5. Click the Options menu kebab next to a resource and select one of the following options:

    • Copy oc describe command copies the command to your clipboard.

      • Log in to the relevant cluster and then run the command.

        The conditions and events of the resource are displayed in YAML format.

    • Copy oc logs command copies the command to your clipboard.

      • Log in to the relevant cluster and then run the command.

        If the resource supports log filtering, a filtered log is displayed.

    • View JSON displays the resource data in JSON format in a web browser.

      The data is the same as the output for the oc get <resource> command.

Viewing a migration plan log

You can view an aggregated log for a migration plan. You use the MTC web console to copy a command to your clipboard and then run the command from the command line interface (CLI).

The command displays the filtered logs of the following pods:

  • Migration Controller

  • Velero

  • Restic

  • Rsync

  • Stunnel

  • Registry

Procedure

  1. In the MTC web console, click Migration Plans.

  2. Click the Migrations number next to a migration plan.

  3. Click View logs.

  4. Click the Copy icon to copy the oc logs command to your clipboard.

  5. Log in to the relevant cluster and enter the command on the CLI.

    The aggregated log for the migration plan is displayed.

Using the migration log reader

You can use the migration log reader to display a single filtered view of all the migration logs.

Procedure

  1. Get the mig-log-reader pod:

    1. $ oc -n openshift-migration get pods | grep log
  2. Enter the following command to display a single migration log:

    1. $ oc -n openshift-migration logs -f <mig-log-reader-pod> -c color (1)
    1The -c plain option displays the log without colors.

Using the must-gather tool

You can collect logs, metrics, and information about MTC custom resources by using the must-gather tool.

The must-gather data must be attached to all customer cases.

You can collect data for a one-hour or a 24-hour period and view the data with the Prometheus console.

Prerequisites

  • You must be logged in to the OKD cluster as a user with the cluster-admin role.

  • You must have the OpenShift CLI installed.

Procedure

  1. Navigate to the directory where you want to store the must-gather data.

  2. Run the oc adm must-gather command:

    • To gather data for the past hour:

      1. $ oc adm must-gather --image=registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8:v1.6

      The data is saved as /must-gather/must-gather.tar.gz. You can upload this file to a support case on the Red Hat Customer Portal.

    • To gather data for the past 24 hours:

      1. $ oc adm must-gather --image= \
      2. registry.redhat.io/rhmtc/openshift-migration-must-gather-rhel8: \
      3. v1.6 -- /usr/bin/gather_metrics_dump

      This operation can take a long time. The data is saved as /must-gather/metrics/prom_data.tar.gz. You can view this file with the Prometheus console.

To view data with the Prometheus console

  1. Create a local Prometheus instance:

    1. $ make prometheus-run

    The command outputs the Prometheus URL:

    Output

    1. Started Prometheus on http://localhost:9090
  2. Launch a web browser and navigate to the URL to view the data by using the Prometheus web console.

  3. After you have viewed the data, delete the Prometheus instance and data:

    1. $ make prometheus-cleanup

Using the Velero CLI to debug Backup and Restore CRs

You can debug the Backup and Restore custom resources (CRs) and partial migration failures with the Velero command line interface (CLI). The Velero CLI runs in the velero pod.

Velero command syntax

Velero CLI commands use the following syntax:

  1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> <command> <resource_id>

You can specify velero-<pod> -n openshift-migration in place of $(oc get pods -n openshift-migration -o name | grep velero).

Help command

The Velero help command lists all the Velero CLI commands:

  1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero --help

Describe command

The Velero describe command provides a summary of warnings and errors associated with a Velero resource:

  1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero <resource> describe <resource_id>

Example

  1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero backup describe 0e44ae00-5dc3-11eb-9ca8-df7e5254778b-2d8ql

Logs command

The Velero logs command provides the logs associated with a Velero resource:

  1. velero <resource> logs <resource_id>

Example

  1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -- ./velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

Debugging a partial migration failure

You can debug a partial migration failure warning message by using the Velero CLI to examine the Restore custom resource (CR) logs.

A partial failure occurs when Velero encounters an issue that does not cause a migration to fail. For example, if a custom resource definition (CRD) is missing or if there is a discrepancy between CRD versions on the source and target clusters, the migration completes but the CR is not created on the target cluster.

Velero logs the issue as a partial failure and then processes the rest of the objects in the Backup CR.

Procedure

  1. Check the status of a MigMigration CR:

    1. $ oc get migmigration <migmigration> -o yaml

    Example output

    1. status:
    2. conditions:
    3. - category: Warn
    4. durable: true
    5. lastTransitionTime: "2021-01-26T20:48:40Z"
    6. message: 'Final Restore openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf: partially failed on destination cluster'
    7. status: "True"
    8. type: VeleroFinalRestorePartiallyFailed
    9. - category: Advisory
    10. durable: true
    11. lastTransitionTime: "2021-01-26T20:48:42Z"
    12. message: The migration has completed with warnings, please look at `Warn` conditions.
    13. reason: Completed
    14. status: "True"
    15. type: SucceededWithWarnings
  2. Check the status of the Restore CR by using the Velero describe command:

    1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore describe <restore>

    Example output

    1. Phase: PartiallyFailed (run 'velero restore logs ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf' for more information)
    2. Errors:
    3. Velero: <none>
    4. Cluster: <none>
    5. Namespaces:
    6. migration-example: error restoring example.com/migration-example/migration-example: the server could not find the requested resource
  3. Check the Restore CR logs by using the Velero logs command:

    1. $ oc exec $(oc get pods -n openshift-migration -o name | grep velero) -n openshift-migration -- ./velero restore logs <restore>

    Example output

    1. time="2021-01-26T20:48:37Z" level=info msg="Attempting to restore migration-example: migration-example" logSource="pkg/restore/restore.go:1107" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf
    2. time="2021-01-26T20:48:37Z" level=info msg="error restoring migration-example: the server could not find the requested resource" logSource="pkg/restore/restore.go:1170" restore=openshift-migration/ccc7c2d0-6017-11eb-afab-85d0007f5a19-x4lbf

    The Restore CR log error message, the server could not find the requested resource, indicates the cause of the partially failed migration.

Using MTC custom resources for troubleshooting

You can check the following Migration Toolkit for Containers (MTC) custom resources (CRs) to troubleshoot a failed migration:

  • MigCluster

  • MigStorage

  • MigPlan

  • BackupStorageLocation

    The BackupStorageLocation CR contains a migrationcontroller label to identify the MTC instance that created the CR:

    1. labels:
    2. migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
  • VolumeSnapshotLocation

    The VolumeSnapshotLocation CR contains a migrationcontroller label to identify the MTC instance that created the CR:

    1. labels:
    2. migrationcontroller: ebe13bee-c803-47d0-a9e9-83f380328b93
  • MigMigration

  • Backup

    MTC changes the reclaim policy of migrated persistent volumes (PVs) to Retain on the target cluster. The Backup CR contains an openshift.io/orig-reclaim-policy annotation that indicates the original reclaim policy. You can manually restore the reclaim policy of the migrated PVs.

  • Restore

Procedure

  1. List the MigMigration CRs in the openshift-migration namespace:

    1. $ oc get migmigration -n openshift-migration

    Example output

    1. NAME AGE
    2. 88435fe0-c9f8-11e9-85e6-5d593ce65e10 6m42s
  2. Inspect the MigMigration CR:

    1. $ oc describe migmigration 88435fe0-c9f8-11e9-85e6-5d593ce65e10 -n openshift-migration

    The output is similar to the following examples.

MigMigration example output

  1. name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10
  2. namespace: openshift-migration
  3. labels: <none>
  4. annotations: touch: 3b48b543-b53e-4e44-9d34-33563f0f8147
  5. apiVersion: migration.openshift.io/v1alpha1
  6. kind: MigMigration
  7. metadata:
  8. creationTimestamp: 2019-08-29T01:01:29Z
  9. generation: 20
  10. resourceVersion: 88179
  11. selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/88435fe0-c9f8-11e9-85e6-5d593ce65e10
  12. uid: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  13. spec:
  14. migPlanRef:
  15. name: socks-shop-mig-plan
  16. namespace: openshift-migration
  17. quiescePods: true
  18. stage: false
  19. status:
  20. conditions:
  21. category: Advisory
  22. durable: True
  23. lastTransitionTime: 2019-08-29T01:03:40Z
  24. message: The migration has completed successfully.
  25. reason: Completed
  26. status: True
  27. type: Succeeded
  28. phase: Completed
  29. startTimestamp: 2019-08-29T01:01:29Z
  30. events: <none>

Velero backup CR #2 example output that describes the PV data

  1. apiVersion: velero.io/v1
  2. kind: Backup
  3. metadata:
  4. annotations:
  5. openshift.io/migrate-copy-phase: final
  6. openshift.io/migrate-quiesce-pods: "true"
  7. openshift.io/migration-registry: 172.30.105.179:5000
  8. openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-44dd3bd5-c9f8-11e9-95ad-0205fe66cbb6
  9. openshift.io/orig-reclaim-policy: delete
  10. creationTimestamp: "2019-08-29T01:03:15Z"
  11. generateName: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-
  12. generation: 1
  13. labels:
  14. app.kubernetes.io/part-of: migration
  15. migmigration: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  16. migration-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  17. velero.io/storage-location: myrepo-vpzq9
  18. name: 88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  19. namespace: openshift-migration
  20. resourceVersion: "87313"
  21. selfLink: /apis/velero.io/v1/namespaces/openshift-migration/backups/88435fe0-c9f8-11e9-85e6-5d593ce65e10-59gb7
  22. uid: c80dbbc0-c9f8-11e9-95ad-0205fe66cbb6
  23. spec:
  24. excludedNamespaces: []
  25. excludedResources: []
  26. hooks:
  27. resources: []
  28. includeClusterResources: null
  29. includedNamespaces:
  30. - sock-shop
  31. includedResources:
  32. - persistentvolumes
  33. - persistentvolumeclaims
  34. - namespaces
  35. - imagestreams
  36. - imagestreamtags
  37. - secrets
  38. - configmaps
  39. - pods
  40. labelSelector:
  41. matchLabels:
  42. migration-included-stage-backup: 8886de4c-c9f8-11e9-95ad-0205fe66cbb6
  43. storageLocation: myrepo-vpzq9
  44. ttl: 720h0m0s
  45. volumeSnapshotLocations:
  46. - myrepo-wv6fx
  47. status:
  48. completionTimestamp: "2019-08-29T01:02:36Z"
  49. errors: 0
  50. expiration: "2019-09-28T01:02:35Z"
  51. phase: Completed
  52. startTimestamp: "2019-08-29T01:02:35Z"
  53. validationErrors: null
  54. version: 1
  55. volumeSnapshotsAttempted: 0
  56. volumeSnapshotsCompleted: 0
  57. warnings: 0

Velero restore CR #2 example output that describes the Kubernetes resources

  1. apiVersion: velero.io/v1
  2. kind: Restore
  3. metadata:
  4. annotations:
  5. openshift.io/migrate-copy-phase: final
  6. openshift.io/migrate-quiesce-pods: "true"
  7. openshift.io/migration-registry: 172.30.90.187:5000
  8. openshift.io/migration-registry-dir: /socks-shop-mig-plan-registry-36f54ca7-c925-11e9-825a-06fa9fb68c88
  9. creationTimestamp: "2019-08-28T00:09:49Z"
  10. generateName: e13a1b60-c927-11e9-9555-d129df7f3b96-
  11. generation: 3
  12. labels:
  13. app.kubernetes.io/part-of: migration
  14. migmigration: e18252c9-c927-11e9-825a-06fa9fb68c88
  15. migration-final-restore: e18252c9-c927-11e9-825a-06fa9fb68c88
  16. name: e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  17. namespace: openshift-migration
  18. resourceVersion: "82329"
  19. selfLink: /apis/velero.io/v1/namespaces/openshift-migration/restores/e13a1b60-c927-11e9-9555-d129df7f3b96-gb8nx
  20. uid: 26983ec0-c928-11e9-825a-06fa9fb68c88
  21. spec:
  22. backupName: e13a1b60-c927-11e9-9555-d129df7f3b96-sz24f
  23. excludedNamespaces: null
  24. excludedResources:
  25. - nodes
  26. - events
  27. - events.events.k8s.io
  28. - backups.velero.io
  29. - restores.velero.io
  30. - resticrepositories.velero.io
  31. includedNamespaces: null
  32. includedResources: null
  33. namespaceMapping: null
  34. restorePVs: true
  35. status:
  36. errors: 0
  37. failureReason: ""
  38. phase: Completed
  39. validationErrors: null
  40. warnings: 15

Common issues and concerns

This section describes common issues and concerns that can cause issues during migration.

Updating deprecated internal images

If your application uses images from the openshift namespace, the required versions of the images must be present on the target cluster.

If an OKD 3 image is deprecated in OKD 4.7, you can manually update the image stream tag by using podman.

Prerequisites

  • You must have podman installed.

  • You must be logged in as a user with cluster-admin privileges.

  • If you are using insecure registries, add your registry host values to the [registries.insecure] section of /etc/container/registries.conf to ensure that podman does not encounter a TLS verification error.

  • The internal registries must be exposed on the source and target clusters.

Procedure

  1. Ensure that the internal registries are exposed on the OKD 3 and 4 clusters.

    The internal registry is exposed by default on OKD 4.

  2. If you are using insecure registries, add your registry host values to the [registries.insecure] section of /etc/container/registries.conf to ensure that podman does not encounter a TLS verification error.

  3. Log in to the OKD 3 registry:

    1. $ podman login -u $(oc whoami) -p $(oc whoami -t) --tls-verify=false <registry_url>:<port>
  4. Log in to the OKD 4 registry:

    1. $ podman login -u $(oc whoami) -p $(oc whoami -t) --tls-verify=false <registry_url>:<port>
  5. Pull the OKD 3 image:

    1. $ podman pull <registry_url>:<port>/openshift/<image>
  6. Tag the OKD 3 image for the OKD 4 registry:

    1. $ podman tag <registry_url>:<port>/openshift/<image> \ (1)
    2. <registry_url>:<port>/openshift/<image> (2)
    1Specify the registry URL and port for the OKD 3 cluster.
    2Specify the registry URL and port for the OKD 4 cluster.
  7. Push the image to the OKD 4 registry:

    1. $ podman push <registry_url>:<port>/openshift/<image> (1)
    1Specify the OKD 4 cluster.
  8. Verify that the image has a valid image stream:

    1. $ oc get imagestream -n openshift | grep <image>

    Example output

    1. NAME IMAGE REPOSITORY TAGS UPDATED
    2. my_image image-registry.openshift-image-registry.svc:5000/openshift/my_image latest 32 seconds ago

Direct volume migration does not complete

If direct volume migration does not complete, the target cluster might not have the same node-selector annotations as the source cluster.

Migration Toolkit for Containers (MTC) migrates namespaces with all annotations to preserve security context constraints and scheduling requirements. During direct volume migration, MTC creates Rsync transfer pods on the target cluster in the namespaces that were migrated from the source cluster. If a target cluster namespace does not have the same annotations as the source cluster namespace, the Rsync transfer pods cannot be scheduled. The Rsync pods remain in a Pending state.

You can identify and fix this issue by performing the following procedure.

Procedure

  1. Check the status of the MigMigration CR:

    1. $ oc describe migmigration <pod> -n openshift-migration

    The output includes the following status message:

    Example output

    1. Some or all transfer pods are not running for more than 10 mins on destination cluster
  2. On the source cluster, obtain the details of a migrated namespace:

    1. $ oc get namespace <namespace> -o yaml (1)
    1Specify the migrated namespace.
  3. On the target cluster, edit the migrated namespace:

    1. $ oc edit namespace <namespace>
  4. Add the missing openshift.io/node-selector annotations to the migrated namespace as in the following example:

    1. apiVersion: v1
    2. kind: Namespace
    3. metadata:
    4. annotations:
    5. openshift.io/node-selector: "region=east"
    6. ...
  5. Run the migration plan again.

Error messages and resolutions

This section describes common error messages you might encounter with the Migration Toolkit for Containers (MTC) and how to resolve their underlying causes.

CA certificate error displayed when accessing the MTC console for the first time

If the MTC console displays a CA certificate error message the first time you try to access it, the likely cause is that a cluster uses self-signed CA certificates.

Navigate to the oauth-authorization-server URL in the error message and accept the certificate. To resolve this issue permanently, install the certificate authority so that it is trusted.

If the browser displays an Unauthorized message after you have accepted the CA certificate, navigate to the MTC console and then refresh the web page.

OAuth timeout error in the MTC console

If the MTC console displays a connection has timed out message after you have accepted a self-signed certificate, the cause is likely to be one of the following:

To determine the cause:

  • Inspect the MTC console web page with a browser web inspector.

  • Check the Migration UI pod log for errors.

Certificate signed by unknown authority error

If you use a self-signed certificate to secure a cluster or a replication repository for the Migration Toolkit for Containers (MTC), certificate verification might fail with the following error message: Certificate signed by unknown authority.

You can create a custom CA certificate bundle file and upload it in the MTC web console when you add a cluster or a replication repository.

Procedure

Download a CA certificate from a remote endpoint and save it as a CA bundle file:

  1. $ echo -n | openssl s_client -connect <host_FQDN>:<port> \ (1)
  2. | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > <ca_bundle.cert> (2)
1Specify the host FQDN and port of the endpoint, for example, api.my-cluster.example.com:6443.
2Specify the name of the CA bundle file.

Backup storage location errors in the Velero pod log

If a Velero Backup custom resource contains a reference to a backup storage location (BSL) that does not exist, the Velero pod log might display the following error messages:

  1. Error checking repository for stale locks
  2. Error getting backup storage location: backupstoragelocation.velero.io \"my-bsl\" not found

You can ignore these error messages. A missing BSL cannot cause a migration to fail.

Pod volume backup timeout error in the Velero pod log

If a migration fails because Restic times out, the Velero pod log displays the following error:

  1. level=error msg="Error backing up item" backup=velero/monitoring error="timed out
  2. waiting for all PodVolumeBackups to complete" error.file="/go/src/github.com/
  3. heptio/velero/pkg/restic/backupper.go:165" error.function="github.com/heptio/
  4. velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1

The default value of restic_timeout is one hour. You can increase this parameter for large migrations, keeping in mind that a higher value may delay the return of error messages.

Procedure

  1. In the OKD web console, navigate to OperatorsInstalled Operators.

  2. Click Migration Toolkit for Containers Operator.

  3. In the MigrationController tab, click migration-controller.

  4. In the YAML tab, update the following parameter value:

    1. spec:
    2. restic_timeout: 1h (1)
    1Valid units are h (hours), m (minutes), and s (seconds), for example, 3h30m15s.
  5. Click Save.

Restic verification errors in the MigMigration custom resource

If data verification fails when migrating a persistent volume with the file system data copy method, the MigMigration CR displays the following error:

MigMigration CR status

  1. status:
  2. conditions:
  3. - category: Warn
  4. durable: true
  5. lastTransitionTime: 2020-04-16T20:35:16Z
  6. message: There were verify errors found in 1 Restic volume restores. See restore `<registry-example-migration-rvwcm>`
  7. for details (1)
  8. status: "True"
  9. type: ResticVerifyErrors (2)
1The error message identifies the Restore CR name.
2ResticVerifyErrors is a general error warning type that includes verification errors.

A data verification error does not cause the migration process to fail.

You can check the Restore CR to troubleshoot the data verification error.

Procedure

  1. Log in to the target cluster.

  2. View the Restore CR:

    1. $ oc describe <registry-example-migration-rvwcm> -n openshift-migration

    The output identifies the persistent volume with PodVolumeRestore errors.

    Example output

    1. status:
    2. phase: Completed
    3. podVolumeRestoreErrors:
    4. - kind: PodVolumeRestore
    5. name: <registry-example-migration-rvwcm-98t49>
    6. namespace: openshift-migration
    7. podVolumeRestoreResticErrors:
    8. - kind: PodVolumeRestore
    9. name: <registry-example-migration-rvwcm-98t49>
    10. namespace: openshift-migration
  3. View the PodVolumeRestore CR:

    1. $ oc describe <migration-example-rvwcm-98t49>

    The output identifies the Restic pod that logged the errors.

    PodVolumeRestore CR with Restic pod error

    1. completionTimestamp: 2020-05-01T20:49:12Z
    2. errors: 1
    3. resticErrors: 1
    4. ...
    5. resticPod: <restic-nr2v5>
  4. View the Restic pod log to locate the errors:

    1. $ oc logs -f <restic-nr2v5>

Restic permission error when migrating from NFS storage with root_squash enabled

If you are migrating data from NFS storage and root_squash is enabled, Restic maps to nfsnobody and does not have permission to perform the migration. The Restic pod log displays the following error:

Restic permission error

  1. backup=openshift-migration/<backup_id> controller=pod-volume-backup error="fork/exec
  2. /usr/bin/restic: permission denied" error.file="/go/src/github.com/vmware-tanzu/
  3. velero/pkg/controller/pod_volume_backup_controller.go:280" error.function=
  4. "github.com/vmware-tanzu/velero/pkg/controller.(*podVolumeBackupController).processBackup"
  5. logSource="pkg/controller/pod_volume_backup_controller.go:280" name=<backup_id>
  6. namespace=openshift-migration

You can resolve this issue by creating a supplemental group for Restic and adding the group ID to the MigrationController CR manifest.

Procedure

  1. Create a supplemental group for Restic on the NFS storage.

  2. Set the setgid bit on the NFS directories so that group ownership is inherited.

  3. Add the restic_supplemental_groups parameter to the MigrationController CR manifest on the source and target clusters:

    1. spec:
    2. restic_supplemental_groups: <group_id> (1)
    1Specify the supplemental group ID.
  4. Wait for the Restic pods to restart so that the changes are applied.

Known issues

This release has the following known issues:

  • During migration, the Migration Toolkit for Containers (MTC) preserves the following namespace annotations:

    • openshift.io/sa.scc.mcs

    • openshift.io/sa.scc.supplemental-groups

    • openshift.io/sa.scc.uid-range

      These annotations preserve the UID range, ensuring that the containers retain their file system permissions on the target cluster. There is a risk that the migrated UIDs could duplicate UIDs within an existing or future namespace on the target cluster. (BZ#1748440)

  • Most cluster-scoped resources are not yet handled by MTC. If your applications require cluster-scoped resources, you might have to create them manually on the target cluster.

  • If a migration fails, the migration plan does not retain custom PV settings for quiesced pods. You must manually roll back the migration, delete the migration plan, and create a new migration plan with your PV settings. (BZ#1784899)

  • If a large migration fails because Restic times out, you can increase the restic_timeout parameter value (default: 1h) in the MigrationController custom resource (CR) manifest.

  • If you select the data verification option for PVs that are migrated with the file system copy method, performance is significantly slower.

  • If you are migrating data from NFS storage and root_squash is enabled, Restic maps to nfsnobody. The migration fails and a permission error is displayed in the Restic pod log. (BZ#1873641)

    You can resolve this issue by adding supplemental groups for Restic to the MigrationController CR manifest:

    1. spec:
    2. ...
    3. restic_supplemental_groups:
    4. - 5555
    5. - 6666
  • If you perform direct volume migration with nodes that are in different availability zones, the migration might fail because the migrated pods cannot access the PVC. (BZ#1947487)

Rolling back a migration

You can roll back a migration by using the MTC web console or the CLI.

You can also roll back a migration manually.

Rolling back a migration by using the MTC web console

You can roll back a migration by using the Migration Toolkit for Containers (MTC) web console.

The following resources remain in the migrated namespaces for debugging after a failed direct volume migration (DVM):

  • Config maps (source and destination clusters)

  • Secret CRs (source and destination clusters)

  • Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically.

If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.

Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.

Procedure

  1. In the MTC web console, click Migration plans.

  2. Click the Options menu kebab beside a migration plan and select Rollback under Migration.

  3. Click Rollback and wait for rollback to complete.

    In the migration plan details, Rollback succeeded is displayed.

  4. Verify that rollback was successful in the OKD web console of the source cluster:

    1. Click HomeProjects.

    2. Click the migrated project to view its status.

    3. In the Routes section, click Location to verify that the application is functioning, if applicable.

    4. Click WorkloadsPods to verify that the pods are running in the migrated namespace.

    5. Click StoragePersistent volumes to verify that the migrated persistent volume is correctly provisioned.

Rolling back a migration from the command line interface

You can roll back a migration by creating a MigMigration custom resource (CR) from the command line interface.

The following resources remain in the migrated namespaces for debugging after a failed direct volume migration (DVM):

  • Config maps (source and destination clusters)

  • Secret CRs (source and destination clusters)

  • Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

If you later run the same migration plan successfully, the resources from the failed migration are deleted automatically.

If your application was stopped during a failed migration, you must roll back the migration to prevent data corruption in the persistent volume.

Rollback is not required if the application was not stopped during migration because the original application is still running on the source cluster.

Procedure

  1. Create a MigMigration CR based on the following example:

    1. $ cat << EOF | oc apply -f -
    2. apiVersion: migration.openshift.io/v1alpha1
    3. kind: MigMigration
    4. metadata:
    5. labels:
    6. controller-tools.k8s.io: "1.0"
    7. name: <migmigration>
    8. namespace: openshift-migration
    9. spec:
    10. ...
    11. rollback: true
    12. ...
    13. migPlanRef:
    14. name: <migplan> (1)
    15. namespace: openshift-migration
    16. EOF
    1Specify the name of the associated MigPlan CR.
  2. In the MTC web console, verify that the migrated project resources have been removed from the target cluster.

  3. Verify that the migrated project resources are present in the source cluster and that the application is running.

Rolling back a migration manually

You can roll back a failed migration manually by deleting the stage pods and unquiescing the application.

If you run the same migration plan successfully, the resources from the failed migration are deleted automatically.

The following resources remain in the migrated namespaces after a failed direct volume migration (DVM):

  • Config maps (source and destination clusters)

  • Secret CRs (source and destination clusters)

  • Rsync CRs (source cluster)

These resources do not affect rollback. You can delete them manually.

Procedure

  1. Delete the stage pods on all clusters:

    1. $ oc delete $(oc get pods -l migration.openshift.io/is-stage-pod -n <namespace>) (1)
    1Namespaces specified in the MigPlan CR.
  2. Unquiesce the application on the source cluster by scaling the replicas to their premigration number:

    1. $ oc scale deployment <deployment> --replicas=<premigration_replicas>

    The migration.openshift.io/preQuiesceReplicas annotation in the Deployment CR displays the premigration number of replicas:

    1. apiVersion: extensions/v1beta1
    2. kind: Deployment
    3. metadata:
    4. annotations:
    5. deployment.kubernetes.io/revision: "1"
    6. migration.openshift.io/preQuiesceReplicas: "1"
  3. Verify that the application pods are running on the source cluster:

    1. $ oc get pod -n <namespace>

Uninstalling MTC and deleting resources

You can uninstall the Migration Toolkit for Containers (MTC) and delete its resources to clean up the cluster.

Deleting the velero CRDs removes Velero from the cluster.

Prerequisites

  • You must be logged in as a user with cluster-admin privileges.

Procedure

  1. Delete the MigrationController custom resource (CR) on all clusters:

    1. $ oc delete migrationcontroller <migration_controller>
  2. Uninstall the Migration Toolkit for Containers Operator on OKD 4 by using the Operator Lifecycle Manager.

  3. Uninstall the Migration Toolkit for Containers Operator on OKD 3 by deleting the operator CR manifest:

    1. $ oc delete -f operator.yml
  4. Delete cluster-scoped resources on all clusters by running the following commands:

    • migration custom resource definitions (CRDs):

      1. $ oc delete $(oc get crds -o name | grep 'migration.openshift.io')
    • velero CRDs:

      1. $ oc delete $(oc get crds -o name | grep 'velero')
    • migration cluster roles:

      1. $ oc delete $(oc get clusterroles -o name | grep 'migration.openshift.io')
    • migration-operator cluster role:

      1. $ oc delete clusterrole migration-operator
    • velero cluster roles:

      1. $ oc delete $(oc get clusterroles -o name | grep 'velero')
    • migration cluster role bindings:

      1. $ oc delete $(oc get clusterrolebindings -o name | grep 'migration.openshift.io')
    • migration-operator cluster role bindings:

      1. $ oc delete clusterrolebindings migration-operator
    • velero cluster role bindings:

      1. $ oc delete $(oc get clusterrolebindings -o name | grep 'velero')

Additional resources for uninstalling MTC