Tutorial: End-to-end Kubeflow on GCP

Training an MNIST model with Kubeflow on Google Cloud Platform

This guide walks you through an end-to-end example of Kubeflow on GoogleCloud Platform (GCP) using a Jupyter notebook,mnist_gcp.ipynb.By working through the notebook, you learnhow to deploy Kubeflow on Kubernetes Engine (GKE), train an MNIST machinelearning model for image classification, and use the model for online inference(also known as online prediction).

Introductions

Overview of GCP and GKE

Google Cloud Platform (GCP) is a suite of cloud computing services runningon Google infrastructure. The services include compute power, data storage,data analytics, and machine learning.

The Cloud SDK is a set of tools that you can use to interact withGCP from the command line, including the gcloud command and others.

Kubernetes Engine (GKE) is a managed service on GCP whereyou can deploy containerized applications. You describe the resources that yourapplication needs, and GKE provisions and manages the underlyingcloud resources.

Here’s a list of the primary GCP services that you use when following thisguide:

The model and the data

This tutorial trains a TensorFlow model on theMNIST dataset, which is the hello world for machine learning.

The MNIST dataset contains a large number of images of hand-written digits inthe range 0 to 9, as well as the labels identifying the digit in each image.

After training, the model can classify incoming images into 10 categories(0 to 9) based on what it’s learned about handwritten images. In other words,you send an image to the model, and the model does its best to identify thedigit shown in the image.Prediction UI

In the above screenshot, the image shows a hand-written 7. This image wasthe input to the model. The table below the image shows a bar graph for eachclassification label from 0 to 9, as output by the model. Each barrepresents the probability that the image matches the respective label.Judging by this screenshot, the model seems pretty confident that this imageis a 7.

The overall workflow

The following diagram shows what you accomplish by following this guide:

ML workflow for training and serving an MNIST model

In summary:

  • Setting up Kubeflow in a GKEcluster.

  • Training the model:

  • Using the model for prediction (inference):

    • Saving the trained model to Cloud Storage.
    • Using TensorFlow Serving to serve the model.
    • Running a simple web app to send a prediction request to the model anddisplay the result.

It’s time to get started!

Set up and run the MNIST tutorial on GCP

  • Follow the GCP instructions to deploy Kubeflow withCloud Identity-Aware Proxy (IAP).

  • Launch a Jupyter notebook in your Kubeflow cluster. See the guide tosetting up yournotebooks.Note: This tutorial has been tested with the Tensorflow 1.15 CPU imageas the baseline image for the notebook.

  • Launch a terminal in Jupyter and clone the Kubeflow examples repository:

  1. git clone https://github.com/kubeflow/examples.git git_kubeflow-examples
  • Tip: When you start a terminal in Jupyter, run the command bash to starta bash terminal which is much more friendly than the default shell.

  • Tip: You can change the URL for your notebook from ‘/tree’ to ‘/lab’ to switch to using JupyterLab.

  • Open the notebook mnist/mnist_gcp.ipynb.

  • Follow the instructions in the notebook to train and deploy MNIST on Kubeflow.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 26.02.2020: Clarified some things in the intro page for the MNIST notebook (#1731) (e51c3b21)