End-to-end Kubeflow on GCP

Kubeflow on Google Cloud Platform

This guide walks you through an end-to-end example of Kubeflow on GoogleCloud Platform (GCP). By working through the guide, you learnhow to deploy Kubeflow on Kubernetes Engine (GKE), train an MNIST machinelearning model for image classification, and use the model for online inference(also known as online prediction).

Introductions

Overview of GCP and GKE

Google Cloud Platform (GCP) is a suite of cloud computing services runningon Google infrastructure. The services include compute power, data storage,data analytics, and machine learning.

The Cloud SDK is a set of tools that you can use to interact withGCP from the command line, including the gcloud command and others.

Kubernetes Engine (GKE) is a managed service on GCP whereyou can deploy containerized applications. You describe the resources that yourapplication needs, and GKE provisions and manages the underlyingcloud resources.

Here’s a list of the primary GCP services that you use when following thisguide:

The model and the data

This tutorial trains a TensorFlow model on theMNIST dataset, which is the hello world for machine learning.

The MNIST dataset contains a large number of images of hand-written digits inthe range 0 to 9, as well as the labels identifying the digit in each image.

After training, the model can classify incoming images into 10 categories(0 to 9) based on what it’s learned about handwritten images. In other words,you send an image to the model, and the model does its best to identify thedigit shown in the image.Prediction UI

In the above screenshot, the image shows a hand-written 7. This image wasthe input to the model. The table below the image shows a bar graph for eachclassification label from 0 to 9, as output by the model. Each barrepresents the probability that the image matches the respective label.Judging by this screenshot, the model seems pretty confident that this imageis a 7.

The overall workflow

The following diagram shows what you accomplish by following this guide:

ML workflow for training and serving an MNIST model

In summary:

  • Setting up Kubeflow in a GKEcluster.

  • Testing the code locally using a Jupyter notebook.

  • Training the model:

  • Using the model for prediction (inference):

    • Saving the trained model to Cloud Storage.
    • Using TensorFlow Serving to serve the model.
    • Running a simple web app to send a prediction request to the model anddisplay the result.

It’s time to get started!

Set up your environment

Download the project files

To simplify this tutorial, you can use a set of prepared files that includea TensorFlow application for training your model, a web UI to send predictionrequests and display the results, and the Docker files to buildrunnable containers for the training and prediction applications.The project files are in the Kubeflow examplesrepositoryon GitHub.

Clone the project files and go to the directory containing the MNIST example:

  1. cd ${HOME}
  2. git clone https://github.com/kubeflow/examples.git
  3. cd examples/mnist
  4. WORKING_DIR=$(pwd)

As an alternative to cloning, you can download theKubeflow examples repository zip file.

Set up your GCP account and SDK

Follow these steps to set up your GCP environment:

  • As you work through this tutorial, your project uses billable components ofGCP. To minimise costs, follow the instructions toclean up your GCP resources when you’ve finished with them.
  • This guide assumes you want to manage your GCP environment on your own serverrather than in the Cloud Shell environment. If you choose touse the Cloud Shell, some of the components are pre-installed in your shell.

Install Docker

Follow the Docker installation guide.

Install kubectl

Run the following Cloud SDK command to install thekubectl command-line tool for Kubernetes:

  1. gcloud components install kubectl

Install kustomize

Kubeflow makes use of kustomizeto help manage deployments.

Make sure you have version 2.0.3 of kustomize

This tutorial does not work with later versions of kustomize, due to bug/kustomize/issues/1295.

Install kustomizev2.0.3.See the kustomize installationguide.

Set up some handy environment variables

Set up the following environment variables for use throughout the tutorial:

  • Set your GCP project ID. In the command below, replace <YOUR-PROJECT-ID>with your project ID:
  1. export PROJECT=<YOUR-PROJECT-ID>
  2. gcloud config set project ${PROJECT}
  • Set the zone for your GCP configuration. Choose a zone that offers theresources you need. See the guide to GCP regions and zones.

    • Ensure you have enough Compute Engine regional capacity.By default, the GKE cluster setup described in this guiderequires 16 CPUs.
    • If you want a GPU, ensure your zone offers GPUs. For example, the following commands set the zone to us-central1-c:
  1. export ZONE=us-central1-c
  2. gcloud config set compute/zone ${ZONE}
  • If you want a custom name for your Kubeflow deployment, set theDEPLOYMENT_NAME environment variable. If you don’t set thisenvironment variable, your deployment gets the default name of kubeflow:
  1. export DEPLOYMENT_NAME=kubeflow

Deploy Kubeflow

Follow the instructions in theguide to deploying Kubeflow on GCP,taking note of the following:

  • Make sure you deploy Kubeflow v0.7.1 or later.
  • Set up OAuth client credentials and Cloud Identity-Aware Proxy (IAP)as prompted during the deployment process.

When the cluster is ready, you can do the following:

  • Connect your local kubectl session to the cluster:
  1. gcloud container clusters get-credentials \
  2. ${DEPLOYMENT_NAME} --zone ${ZONE} --project ${PROJECT}
  • Switch to the kubeflow namespace to see the resources on the Kubeflowcluster:
  1. kubectl config set-context $(kubectl config current-context) --namespace=kubeflow
  • Check the resources deployed in the kubeflow namespace:
  1. kubectl get all
  • Access the Kubeflow UI, which becomes available at the following URI afterseveral minutes:
  1. https://<deployment-name>.endpoints.<project>.cloud.goog/

The following screenshot shows the Kubeflow UI:Prediction UI

Notes:

If you own/manage the domain or a subdomain with Cloud DNSthen you can configure this process to be much faster.

Create a Cloud Storage bucket

The next step is to create a Cloud Storage bucket to hold your trained model.

Cloud Storage is a scalable, fully-managed object/blob store.You can use it for a range of scenarios including serving website content,storing data for archival and disaster recovery, or distributing large dataobjects to users via direct download. This tutorial uses Cloud Storage tohold the trained machine learning model and associated data.

Use the gsutil mb command to create a storage bucket. Yourbucket name must be unique across all of Cloud Storage.The following commands create a bucket in the us-central1 region,which corresponds to the us-central1-c zone used earlierin the tutorial:

  1. export BUCKET_NAME=${PROJECT}-${DEPLOYMENT_NAME}-bucket
  2. gsutil mb -c regional -l us-central1 gs://${BUCKET_NAME}

(Optional) Test the code in a Jupyter notebook

The sample you downloaded contains all the code you need. If you like, youcan experiment with and test the code in a Jupyter notebook.

The Kubeflow deployment includes services for spawning and managingJupyter notebooks.

  • Follow the Kubeflow notebooks setup guide tocreate a Jupyter notebook server and open the Jupyter UI.Accept the default settings when configuring your notebook server. Thedefault configuration gives you a standard CPU image with a recent version of TensorFlow.

  • Create a new notebook by clicking New > Python 2 on the Jupyterdashboard.

You can read about using notebooks in the Jupyter documentation.

  • Copy the code from your sample model at${WORKING_DIR}/model.py and paste the code into a cell inyour Jupyter notebook.

  • Run the cell in the notebook. You should see output directly beneath thenotebook cell, something like this:

  1. INFO:tensorflow:TF_CONFIG {}
  2. INFO:tensorflow:cluster=None job_name=None task_index=None
  3. INFO:tensorflow:Will export model
  4. Extracting /tmp/data/train-images-idx3-ubyte.gz
  5. Extracting /tmp/data/train-labels-idx1-ubyte.gz
  6. Extracting /tmp/data/t10k-images-idx3-ubyte.gz
  7. Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
  8. WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpfNskyK
  9. INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_session_config': None, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3e975ea550>, '_evaluation_master': '', '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tf_random_seed': None, '_master': '', '_device_fn': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_model_dir': '/tmp/tmpfNskyK', '_train_distribute': None, '_save_summary_steps': 100}
  10. Train and evaluate
  11. INFO:tensorflow:Running training and evaluation locally (non-distributed).
  12. INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
  13. INFO:tensorflow:Calling model_fn.
  14. INFO:tensorflow:Done calling model_fn.
  15. INFO:tensorflow:Create CheckpointSaverHook.
  16. INFO:tensorflow:Graph was finalized.
  17. INFO:tensorflow:Running local_init_op.
  18. INFO:tensorflow:Done running local_init_op.
  19. INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpfNskyK/model.ckpt.
  20. INFO:tensorflow:loss = 2.3082201, step = 1
  21. INFO:tensorflow:global_step/sec: 6.81158
  22. INFO:tensorflow:loss = 2.0083964, step = 101 (14.683 sec)
  23. INFO:tensorflow:Saving checkpoints for 200 into /tmp/tmpfNskyK/model.ckpt.
  24. INFO:tensorflow:Calling model_fn.
  25. INFO:tensorflow:Done calling model_fn.
  26. INFO:tensorflow:Starting evaluation at 2019-02-02-02:35:00
  27. INFO:tensorflow:Graph was finalized.
  28. INFO:tensorflow:Restoring parameters from /tmp/tmpfNskyK/model.ckpt-200
  29. INFO:tensorflow:Running local_init_op.
  30. INFO:tensorflow:Done running local_init_op.
  31. INFO:tensorflow:Evaluation [1/1]
  32. INFO:tensorflow:Finished evaluation at 2019-02-02-02:35:00
  33. INFO:tensorflow:Saving dict for global step 200: accuracy = 0.8046875, global_step = 200, loss = 0.79056215
  34. INFO:tensorflow:Saving 'checkpoint_path' summary for global step 200: /tmp/tmpfNskyK/model.ckpt-200
  35. INFO:tensorflow:Performing the final export in the end of training.
  36. INFO:tensorflow:Calling model_fn.
  37. INFO:tensorflow:Done calling model_fn.
  38. INFO:tensorflow:Signatures INCLUDED in export for Eval: None
  39. INFO:tensorflow:Signatures INCLUDED in export for Classify: None
  40. INFO:tensorflow:Signatures INCLUDED in export for Regress: None
  41. INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default', 'classes']
  42. INFO:tensorflow:Signatures INCLUDED in export for Train: None
  43. INFO:tensorflow:Restoring parameters from /tmp/tmpfNskyK/model.ckpt-200
  44. INFO:tensorflow:Assets added to graph.
  45. INFO:tensorflow:No assets to write.
  46. INFO:tensorflow:SavedModel written to: /tmp/tmpfNskyK/export/mnist/temp-1549074900/saved_model.pb
  47. INFO:tensorflow:Loss for final step: 0.70332366.
  48. Training done
  49. Export saved model
  50. INFO:tensorflow:Calling model_fn.
  51. INFO:tensorflow:Done calling model_fn.
  52. INFO:tensorflow:Signatures INCLUDED in export for Eval: None
  53. INFO:tensorflow:Signatures INCLUDED in export for Classify: None
  54. INFO:tensorflow:Signatures INCLUDED in export for Regress: None
  55. INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default', 'classes']
  56. INFO:tensorflow:Signatures INCLUDED in export for Train: None
  57. INFO:tensorflow:Restoring parameters from /tmp/tmpfNskyK/model.ckpt-200
  58. INFO:tensorflow:Assets added to graph.
  59. INFO:tensorflow:No assets to write.
  60. INFO:tensorflow:SavedModel written to: mnist/temp-1549074900/saved_model.pb
  61. Done exporting the model

The above output indicates that the program retrieved the sample trainingdata then trained the model for 200 steps, reaching a final accuracy levelof 0.70332366.

Don’t worry if you see the following message after the model has finishedexporting:An exception has occurred, use %tb to see the full traceback. SystemExit.The message occurs because you haven’t yet set up a location for storing themodel.

If you want to play more with the code, try adjusting the number of trainingsteps by setting max_steps to a different value, such as 2000, orexperiment with adjusting other parts of the code.

Prepare to run your training application on GKE

When you downloaded the project files into your ${WORKING_DIR} directory atthe start of the tutorial, you downloaded the TensorFlow code for yourtraining application. The code is in a Python file, model.py, in your${WORKING_DIR} directory.

The model.py program does the following:

  • Downloads the MNIST dataset and loads it for use by the model training code.
  • Offers a choice between two models:

    • A two-layer convolutional neural network (CNN). This tutorial uses theCNN, which is the default model in model.py.
    • A linear classifier, not used in this tutorial.
  • Defines TensorFlow operations to train and evaluate the model.

  • Runs a number of training cycles.

  • Saves the trained model to a specified location, such as your Cloud Storagebucket.

Build the container for your training application

To deploy your code to Kubernetes, you must first build your local project intoa Docker container image and push the image toContainer Registry so that it’s available in the cloud.

  • Create a version tag from the current UNIX timestamp, to be associated withyour model each time it runs:
  1. export VERSION_TAG=$(date +%s)
  • Set the path in Container Registry that you want to push the image to:
  1. export TRAIN_IMG_PATH=gcr.io/${PROJECT}/${DEPLOYMENT_NAME}-train:${VERSION_TAG}
  • Build the Docker image for your working directory:
  1. docker build $WORKING_DIR -t $TRAIN_IMG_PATH -f $WORKING_DIR/Dockerfile.model

The container is tagged with its eventual path in Container Registry, but ithasn’t been uploaded to Container Registry yet.

If everything went well, your program is now encapsulated in a newcontainer.

  • Test the container locally:
  1. docker run -it ${TRAIN_IMG_PATH}

You may see some warnings from TensorFlow about deprecated functionality.Then you should see training logs start appearing in your output, similarto these:

  1. Train and evaluate
  2. INFO:tensorflow:Running training and evaluation locally (non-distributed).
  3. INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 1 secs (eval_spec.throttle_secs) or training is finished.
  4. INFO:tensorflow:Calling model_fn.
  5. INFO:tensorflow:Done calling model_fn.
  6. INFO:tensorflow:Create CheckpointSaverHook.
  7. INFO:tensorflow:Graph was finalized.
  8. 2019-02-02 04:17:20.655001: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
  9. INFO:tensorflow:Running local_init_op.
  10. INFO:tensorflow:Done running local_init_op.
  11. INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmph861eL/model.ckpt.
  12. INFO:tensorflow:loss = 2.3235848, step = 1
  13. INFO:tensorflow:Saving checkpoints for 4 into /tmp/tmph861eL/model.ckpt.
  14. INFO:tensorflow:Loss for final step: 2.2987146.
  15. INFO:tensorflow:Calling model_fn.
  16. INFO:tensorflow:Done calling model_fn.
  17. INFO:tensorflow:Starting evaluation at 2019-02-02-04:17:21
  18. INFO:tensorflow:Graph was finalized.
  19. INFO:tensorflow:Restoring parameters from /tmp/tmph861eL/model.ckpt-4
  20. INFO:tensorflow:Running local_init_op.
  21. INFO:tensorflow:Done running local_init_op.
  22. INFO:tensorflow:Evaluation [1/1]
  23. INFO:tensorflow:Finished evaluation at 2019-02-02-04:17:22
  24. INFO:tensorflow:Saving dict for global step 4: accuracy = 0.1640625, global_step = 4, loss = 2.2869625
  25. INFO:tensorflow:Calling model_fn.
  26. INFO:tensorflow:Done calling model_fn.
  27. INFO:tensorflow:Create CheckpointSaverHook.
  28. INFO:tensorflow:Graph was finalized.
  29. INFO:tensorflow:Restoring parameters from /tmp/tmph861eL/model.ckpt-4
  30. INFO:tensorflow:Running local_init_op.
  31. INFO:tensorflow:Done running local_init_op.
  32. INFO:tensorflow:Saving checkpoints for 5 into /tmp/tmph861eL/model.ckpt.
  33. INFO:tensorflow:loss = 2.3025892, step = 5
  34. INFO:tensorflow:Saving checkpoints for 7 into /tmp/tmph861eL/model.ckpt.
  35. INFO:tensorflow:Loss for final step: 2.2834966.
  36. ...
  • When you see log entries similar to those above, your model training is working. You can terminate the container with Ctrl+c.

Next, upload the container image to Container Registry so that you can run it on your GKE cluster.

  • Run the following command to authenticate to Container Registry:
  1. gcloud auth configure-docker --quiet
  • Push the container to Container Registry:
  1. docker push ${TRAIN_IMG_PATH}

The push may take a few minutes to complete. You should see Docker progressupdates in your command window.

  • Wait until the process is complete, then you should see your new containerimage listed on the Container Registry pageon the GCP console.

Prepare your training component to run on GKE

  • Enter the training/GCS directory:
  1. cd ${WORKING_DIR}/training/GCS
  • Give the job a name so that you can identify it later:
  1. kustomize edit add configmap mnist-map-training --from-literal=name=mnist-train-dist
  • Configure your custom training image:
  1. kustomize edit set image training-image=${TRAIN_IMG_PATH}
  • Configure the image to run distributed by setting the number of parameter servers and workers to use. The numPs means the number of Ps (parameter server) and the numWorkers means the number of worker:
  1. ../base/definition.sh --numPs 1 --numWorkers 2
  • Set the training parameters (training steps, batch size and learning rate):
  1. kustomize edit add configmap mnist-map-training --from-literal=trainSteps=200
  2. kustomize edit add configmap mnist-map-training --from-literal=batchSize=100
  3. kustomize edit add configmap mnist-map-training --from-literal=learningRate=0.01
  • Configure parameters for where the training results and exported model will be saved in Cloud Storage. Use a subdirectory based on the VERSION_TAG, so that if the tutorial is run more than once, the training can start fresh each time.
  1. export BUCKET_PATH=${BUCKET_NAME}/${VERSION_TAG}
  2. kustomize edit add configmap mnist-map-training --from-literal=modelDir=gs://${BUCKET_PATH}/
  3. kustomize edit add configmap mnist-map-training --from-literal=exportDir=gs://${BUCKET_PATH}/export

Check the permissions for your training component

You need to ensure that your Python code has the required permissionsto read/write to your Cloud Storage bucket. Kubeflow solves this by creating auserservice accountwithin your project as a part of the deployment. You can use the followingcommand to list the service accounts for your Kubeflow deployment:

  1. gcloud iam service-accounts list | grep ${DEPLOYMENT_NAME}

Kubeflow granted the user service account the necessary permissions to readand write to your storage bucket. Kubeflow also added aKubernetes secretnamed user-gcp-sa to your cluster, containing the credentials needed toauthenticate as this service account within the cluster:

  1. kubectl describe secret user-gcp-sa

To access your storage bucket from inside the train container, you must set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the JSON file contained in the secret.Set the variable by passing the following parameters:

  1. kustomize edit add configmap mnist-map-training --from-literal=secretName=user-gcp-sa
  2. kustomize edit add configmap mnist-map-training --from-literal=secretMountPath=/var/secrets
  3. kustomize edit add configmap mnist-map-training --from-literal=GOOGLE_APPLICATION_CREDENTIALS=/var/secrets/user-gcp-sa.json

Train the model on GKE

Now you are ready to run the TensorFlow training job on your cluster onGKE.

Apply the container to the cluster:

  1. kustomize build . |kubectl apply -f -

When the command finishes running, there should be a new workload on thecluster, with the name mnist-train-dist-chief-0. If you set the option to runa distributed workload, the worker workloads show up on the cluster too.You can see the workloads on the GKE Workloads pageon the GCP console. To see the logs, click the mnist-train-dist-chief-0workload, then click Container logs.

View your trained model on Cloud Storage

When training is complete, you should see the model data pushed into yourCloud Storage bucket, tagged with the same version number as the containerthat generated it. To explore, click your bucket name on theCloud Storage page on the GCP Console.

The output from the training application includes the following:

  • A set of checkpoints that you can use to resume training from a givenpoint later.
  • An export directory that holds the trained model in a format that theTensorFlow Serving component can read. Read on to see howto serve your model for prediction using TensorFlow Serving.

Serve the trained model

Now you can put your trained model on a server and send it prediction requests.

  • Enter the serving/GCS directory:
  1. cd $WORKING_DIR/serving/GCS
  • Set a name for the TensorFlow Serving job:
  1. kustomize edit add configmap mnist-map-serving --from-literal=name=mnist-service
  • Set your model path:
  1. kustomize edit add configmap mnist-map-serving --from-literal=modelBasePath=gs://${BUCKET_PATH}/export
  • Deploy the model, and run a service to make the deployment accessible toother pods in the cluster:
  1. kustomize build . |kubectl apply -f -
  • You can check the deployment by running the following command:
  1. kubectl describe deployments mnist-service
  • The service makes the mnist-service deployment accessible over port 9000.Run the following command to get the details of the service:
  1. kubectl describe service mnist-service

You can also see the mnist-service service on theGKE Services page on the GCP Console. Click theservice name to see the service details. You can see that it listens forconnections within the cluster on port 9000.

Send online prediction requests to your model

Now you can deploy the final piece of your system: a web interface that caninteract with a trained model server.

Deploy the sample web UI

When you downloaded the project files at the start of the tutorial, youdownloaded the code for a simple web UI. The code is stored in the${WORKING_DIR}/web-ui directory.

The web UI uses a Flask server to host the HTML, CSS, and JavaScriptfiles for the web page. The Python program, mnist_client.py, contains afunction that interacts directly with the TensorFlow model server.

The ${WORKING_DIR}/web-ui directory also contains a Dockerfile to buildthe application into a container image.

(Optional) Build an image and push it to Container Registry

Follow these steps to build an image from your code:

  • Move back to the project directory:
  1. cd ${WORKING_DIR}
  1. export UI_IMG_PATH=gcr.io/${PROJECT}/${DEPLOYMENT_NAME}-web-ui
  • Build the Docker image for the web-ui directory:
  1. docker build ${WORKING_DIR}/web-ui -t ${UI_IMG_PATH}
  • Allow Docker access to your Container Registry:
  1. gcloud auth configure-docker --quiet
  • Push the container to Container Registry:
  1. docker push ${UI_IMG_PATH}

The push may take a few minutes to complete. You should see Docker progressupdates in your command window.

  • Wait until the process is complete, then you should see your new containerimage listed on the Container Registry pageon the GCP console. The container name is <DEPLOYMENT_NAME>-web-ui.

Deploy the web UI to the cluster

Follow these steps to deploy the web UI to your Kubeflow cluster:

The example comes with a simple web front end that can be used with your model.

  • Enter the front directory:
  1. cd ${WORKING_DIR}/front
  • If you chose to build an image for the web UI and uploaded it toContainer Registry in the previous step, then you need to updatethe path to the image in the deployment configuration:

    • Edit the deployment.yaml file in ${WORKING_DIR}/front.
    • Change the image value to match your ${UI_IMG_PATH}. The result shouldlook like this:
  1. ...
  2. spec:
  3. containers:
  4. - image: gcr.io/<your-project>/<your-deployment-name>-web-ui
  5. ...

(You can choose to use the image already deployed to Container Registry. Inthat case, you do not need to edit the deployment.yaml file.)

  • Deploy the web front end to your cluster:
  1. kustomize build . |kubectl apply -f -

Now there should be a new web UI running in the cluster. You can see theweb-ui entry on the GKE Workloads page and onthe Services page.

Access the web UI in your browser

The web-ui service added is of type ClusterIP, meaning it can’t be accessed from outside the cluster. In order to load the web UI in your web browser, you have to set up a direct connection to the cluster:

  1. kubectl port-forward svc/web-ui 8080:80

Visit http://localhost:8080/ to access the web UI in your browser. (If you’re working from the Cloud Shell, you can instead click the ‘web preview’ button and then select “Preview on Port 8080” to connect.)

  • The web UI offers three fields to connect to the prediction server:Connection UI

  • By default, the fields on the above web page are pre-filled with the detailsof the TensorFlow server that’s running in the cluster: a name, an address,and a port. You can change them if you used different values:

    • Model Name: mnist - The name that you gave to your servingcomponent.

    • Server Address: mnist-service - You can enter the server address as adomain name or an IP address. Note that this is an internal IP address forthe mnist-service service within your cluster, not a public address.Kubernetes provides an internal DNS service, so you can write the name ofthe service in the address field. Kubernetes routes all requests to therequired IP address automatically.

    • Port: 9000 - The server listens on port 9000 by default.

  • Click Connect. The system finds the server in your cluster and displaysthe classification results.

The final product

Below the connect screen, you should see a prediction UI for your MNISTmodel.Prediction UI

Each time you refresh the page, it loads a random image from the MNIST testdataset and performs a prediction. In the above screenshot, the image shows ahand-written 7. The table below the image shows a bar graph for eachclassification label from 0 to 9. Each bar representsthe probability that the image matches the respective label.

Clean up your GCP environment

Run the following command to delete your deployment and related resources:

  1. gcloud deployment-manager --project=${PROJECT} deployments delete ${DEPLOYMENT_NAME}

Delete your Cloud Storage bucket when you’ve finished with it:

  1. gsutil rm -r gs://${BUCKET_NAME}

Delete the container images uploaded to Container Registry:

  1. // Find the digest id for each container image:
  2. gcloud container images list-tags gcr.io/${PROJECT}/${DEPLOYMENT_NAME}-train
  3. gcloud container images list-tags gcr.io/${PROJECT}/${DEPLOYMENT_NAME}-web-ui
  4. // Delete each image:
  5. gcloud container images delete gcr.io/$PROJECT/${DEPLOYMENT_NAME}-train:$DIGEST_ID
  6. gcloud container images delete gcr.io/$PROJECT/${DEPLOYMENT_NAME}-web-ui:$DIGEST_ID

As an alternative to the command line, you can delete the various resourcesusing the GCP Console.