Upgrading and Reinstalling

How to upgrade or reinstall your Pipelines deployment on Google Cloud Platform (GCP)

Starting from Kubeflow v0.5, Kubeflow Pipelines persists thepipeline data in permanent storage volumes. Kubeflow Pipelines thereforesupports the following capabilities:

  • Reinstall: You can delete a cluster and create a new cluster, specifyingthe existing storage volumes to retrieve the original data in the new cluster.This guide tells you how to reinstall Kubeflow Pipelines as part of afull Kubeflow deployment.

  • Upgrade (limited support):

The full Kubeflow deployment currently supports upgrading in Alphastatus with limited support. Check the following sources for progressupdates:

Before you start

This guide tells you how to reinstall Kubeflow Pipelines as part of afull Kubeflow deployment on Google Kubernetes Engine (GKE). See theKubeflow deployment guide.

Instead of the full Kubeflow deployment, you can use Kubeflow PipelinesStandalone or GCP Hosted ML Pipelines (Alpha), which support different optionsfor upgrading and reinstalling. See the Kubeflow Pipelines installationoptions.

Kubeflow Pipelines data storage

Kubeflow Pipelines creates and manages the following data related to yourmachine learning pipeline:

  • Metadata: Experiments, jobs, runs, etc. Kubeflow Pipelinesstores the pipeline metadata in a MySQL database.
  • Artifacts: Pipeline packages, metrics, views, etc. Kubeflow Pipelinesstores the artifacts in a Minio server.

Kubeflow Pipelines uses the KubernetesPersistentVolume(PV) subsystem to provision the MySQL database and the Minio server.On GCP, Kubeflow Pipelines creates a Google Compute EnginePersistent Disk (PD)and mounts it as a PV.

After deploying Kubeflow on GCP, you can see two entries inthe GCP Deployment Manager,one for the cluster deployment and one for the storage deployment:

Deployment Manager showing the storage deployment entry

The entry with the suffix -storage creates one PD for the metadata store andone for the artifact store:

Deployment Manager showing details of the storage deployment entry

Reinstalling Kubeflow Pipelines

You can delete a Kubeflow cluster and create a new one, specifyingyour existing storage to retrieve the original data in the new cluster.

Notes:

  • You must use command-line deployment.You cannot reinstall Kubeflow Pipelines using the web interface.
  • When you do kfctl apply or kfctl build, you should use a differentdeployment name from your existing deployment name. Otherwise, kfctl willdelete your data in the existing PDs. This guide defines the deployment namein the ${KF_NAME} environment variable.

To reinstall Kubeflow Pipelines:

  • Follow the command-line deploymentinstructions, but note the followingchanges in the procedure.

  • Set a different ${KF_NAME} name from your existing ${KF_NAME}.

  • Before running the kfctl apply command:

    • Edit ${KF_DIR}/gcp_config/storage-kubeflow.yaml and set the followingflag to skip creating new storage:
  1. ...
  2. createPipelinePersistentStorage: false
  3. ...
  • Edit ${KF_DIR}/kustomize/minio/overlays/minioPd/params.env and specifythe PD that your existing deployment uses for the Minio server:
  1. ...
  2. minioPd=[NAME-OF-ARTIFACT-STORAGE-DISK]
  3. ...
  • Edit ${KF_DIR}/kustomize/mysql/overlays/mysqlPd/params.env and specifythe PD that your existing deployment uses for the MySQL database:
  1. ...
  2. mysqlPd=[NAME-OF-METADATA-STORAGE-DISK]
  3. ...
  • Run the kfctl apply command to deploy Kubeflow as usual:
  1. kfctl apply -V -f ${CONFIG_FILE}

You should now have a new Kubeflow deployment that uses the same pipelines datastorage as your previous deployment. Follow the steps in the deployment guideto check your deployment.