Securing Your Clusters

How to secure Kubeflow clusters using VPC service controls and private GKE

Alpha version

This feature is currently in alpha release status with limited support. TheKubeflow team is interested in any feedback you may have, in particular withregards to usability of the feature. Note the following issues already reported:

This guide describes how to secure Kubeflow using VPC Service Controls and private GKE.

Together these two features signficantly increase securityand mitigate the risk of data exfiltration.

  • VPC Service Controls allow you to define a perimeter aroundGoogle Cloud Platform (GCP) services.

Kubeflow uses VPC Service Controls to prevent applicationsrunning on GKE from writing data to GCP resources outsidethe perimeter.

  • Private GKE removes public IP addresses from GKE nodes makingthem inaccessible from the public internet.

Kubeflow uses IAP to make Kubeflow web apps accessiblefrom your browser.

VPC Service Controls allow you to restrict which Google services are accessible from yourGKE/Kubeflow clusters. This is an important part of security and in particularmitigating the risks of data exfiltration.

For more information refer to the VPC Service Control Docs.

Creating a private Kubernetes Engine clustermeans the Kubernetes Engine nodes won’t have public IP addresses. This can improve security by blocking unwanted outbound/inboundaccess to nodes. Removing IP addresses means external services (such as GitHub, PyPi, and DockerHub) won’t be accessiblefrom the nodes. Google services (such as BigQuery and Cloud Storage) are still accessible.

Importantly this means you can continue to use your Google Container Registry (GCR) to host your Docker images. Other Docker registries (for example, DockerHub) will not be accessible. If you need to use Docker imageshosted outside GCR you can use the scripts provided by Kubeflow to mirror them to your GCR registry.

Before you start

Before installing Kubeflow ensure you have installed the following tools:

You will need to know your gcloud organization ID and project number; you can get them via gcloud.

  1. export PROJECT=<your GCP project id>
  2. export ORGANIZATION_NAME=<name of your organization>
  3. export ORGANIZATION=$(gcloud organizations list --filter=DISPLAY_NAME=${ORGANIZATION_NAME} --format='value(name)')
  4. export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT} --format='value(projectNumber)')

Enable VPC Service Controls In Your Project

  • Enable VPC service controls:
  1. export PROJECT=<Your project>
  2. gcloud services enable accesscontextmanager.googleapis.com \
  3. cloudresourcemanager.googleapis.com \
  4. dns.googleapis.com --project=${PROJECT}
  • Check if you have an access policy object already created:
  1. gcloud beta access-context-manager policies list \
  2. --organization=${ORGANIZATION}
  • An access policy is a GCP resource object that defines service perimeters. There can be only one access policy object in an organization, and it is a child of the Organization resource.
    • If you don’t have an access policy object, create one:
  1. gcloud beta access-context-manager policies create \
  2. --title "default" --organization=${ORGANIZATION}
  • Save the Access Policy Object ID as an environment variable so that it can be used in subsequent commands:
  1. export POLICYID=$(gcloud beta access-context-manager policies list --organization=${ORGANIZATION} --limit=1 --format='value(name)')
  • Create a service perimeter:
  1. gcloud beta access-context-manager perimeters create KubeflowZone \
  2. --title="Kubeflow Zone" --resources=projects/${PROJECT_NUMBER} \
  3. --restricted-services=bigquery.googleapis.com,containerregistry.googleapis.com,storage.googleapis.com \
  4. --project=${PROJECT} --policy=${POLICYID}
  • Here we have created a service perimeter with the name KubeflowZone.

  • The perimeter is created in PROJECT_NUMBER and restricts access to GCS (storage.googleapis.com), BigQuery (bigquery.googleapis.com), and GCR (containerregistry.googleapis.com).

  • Placing GCS (Google Cloud Storage) and BigQuery in the perimeter means that access to GCS and BigQueryresources owned by this project is now restricted. By default, access from outsidethe perimeter will be blocked

  • More than one project can be added to the same perimeter

  • Create an access level to allow Google Container Builder to access resources inside the perimiter:

    • Create a members.yaml file with the following contents
  1. - members:
  2. - serviceAccount:${PROJECT_NUMBER}@cloudbuild.gserviceaccount.com
  3. - user:<your email>
  • Google Container Builder is used to mirror Kubeflow images into the perimeter

  • Adding your email allows you to access the GCP servicesinside the perimeter from outside the cluster

    • This is convenient for building and pushing images and datafrom your local machine.
  • For more information refer to the docs.
  • Create the access level:
  1. gcloud beta access-context-manager levels create kubeflow \
  2. --basic-level-spec=members.yaml \
  3. --policy=${POLICYID} \
  4. --title="Kubeflow ${PROJECT}"
  • The name for the level can’t have any hyphens
    • Bind Access Level to a Service Perimeter:
  1. gcloud beta access-context-manager perimeters update KubeflowZone \
  2. --add-access-levels=kubeflow \
  3. --policy=${POLICYID}
  • Set up container registry for GKE private clusters (for more info see instructions):

    • Create a managed private zone
  1. export ZONE_NAME=kubeflow
  2. export NETWORK=<Network you are using for your cluster>
  3. gcloud beta dns managed-zones create ${ZONE_NAME} \
  4. --visibility=private \
  5. --networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \
  6. --description="Kubeflow DNS" \
  7. --dns-name=gcr.io \
  8. --project=${PROJECT}
  • Start a transaction
  1. gcloud dns record-sets transaction start \
  2. --zone=${ZONE_NAME} \
  3. --project=${PROJECT}
  • Add a CNAME record for *.gcr.io
  1. gcloud dns record-sets transaction add \
  2. --name=*.gcr.io. \
  3. --type=CNAME gcr.io. \
  4. --zone=${ZONE_NAME} \
  5. --ttl=300 \
  6. --project=${PROJECT}
  • Add an A record for the restricted VIP
  1. gcloud dns record-sets transaction add \
  2. --name=gcr.io. \
  3. --type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \
  4. --zone=${ZONE_NAME} \
  5. --ttl=300 \
  6. --project=${PROJECT}
  • Commit the transaction
  1. gcloud dns record-sets transaction execute \
  2. --zone=${ZONE_NAME} \
  3. --project=${PROJECT}

Deploy Kubeflow with Private GKE

  • Set user credentials. You only need to run this command once:
  1. gcloud auth application-default login
  • Copy non-GCR hosted images to your GCR registry:

  • Clone the Kubeflow source

  1. git clone https://github.com/kubeflow/kubeflow.git git_kubeflow
  1. cd git_kubeflow/scripts/gke
  2. PROJECT=<PROJECT> make copy-gcb
  1. -

This is needed because your GKE nodes won’t be able to pull images from non GCRregistries because they don’t have public internet addresses

  1. -

gcloud may return an error even though the job issubmited successfully and will run successfullysee kubeflow/kubeflow#3105

  1. -

You can use the Cloud console to monitor your GCB job.

  • Follow the guide to deploying Kubeflow on GCP.When you reach thesetup and deploy step,skip the kfctl apply command and run the kfctl build commandinstead, as described in that step. Now you can edit the configuration filesbefore deploying Kubeflow. Retain the environment variables that you setduring the setup, including ${KF_NAME}, ${KF_DIR}, and ${CONFIG_FILE}.

  • Enable private clusters by editing ${KF_DIR}/gcp_config/cluster-kubeflow.yaml and updating the following two parameters:

  1. privatecluster: true
  2. gkeApiVersion: v1beta1
  • Remove components which are not useful in private clusters:
  1. cd ${KF_DIR}/kustomize
  2. kubectl delete -f cert-manager.yaml
  • Create the deployment:
  1. cd ${KF_DIR}
  2. kfctl apply -V -f ${CONFIG_FILE}
  1. -

If you get an error legacy networks not supported, follow thetroubleshooting guide to create a new network.

  1. -

You will need to manually create the network as a work around for kubeflow/kubeflow#3071

  1. cd ${KF_DIR}/gcp_config
  2. gcloud --project=${PROJECT} deployment-manager deployments create ${KF_NAME}-network --config=network.yaml
  1. -

Then edit ${KF_DIR}/gcp_config/cluster.jinja to add a field network in your cluster

  1. cluster:
  2. name: {{ CLUSTER_NAME }}
  3. network: <name of the new network>
  1. -

To get the name of the new network run

  1. gcloud --project=${PROJECT} compute networks list
  1. - The name will contain the value ${KF_NAME}
  • Update iap-ingress component parameters:
  1. cd ${KF_DIR}/kustomize
  2. gvim iap-ingress.yaml
  1. -

Find and set the privateGKECluster parameter to true:

  1. privateGKECluster: "true"
  1. -

Then apply your changes:

  1. kubectl apply -f iap-ingress.yaml
  • Obtain an HTTPS certificate for your ${FQDN} and create a Kubernetes secret with it.

    • You can create a self signed cert using kube-rsa
  1. go get github.com/kelseyhightower/kube-rsa
  2. kube-rsa ${FQDN}
  1. -

The fully qualified domain is the host field specified for your ingress;you can get it by running

  1. cd ${KF_DIR}/kustomize
  2. grep hostname: iap-ingress.yaml
  1. -

Then create your Kubernetes secret

  1. kubectl create secret tls --namespace=kubeflow envoy-ingress-tls --cert=ca.pem --key=ca-key.pem
  1. -

An alternative option is to upgrade to GKE 1.12 or later and usemanaged certificates

  1. -

See kubeflow/kubeflow#3079

  • Update the various kustomize manifests to use gcr.io images instead of Docker Hub images.

  • Apply all the Kubernetes resources:

  1. cd ${KF_DIR}
  2. kfctl apply -V -f ${CONFIG_FILE}
  • Wait for Kubeflow to become accessible and then access it at this URL:
  1. https://${FQDN}/
  1. -

${FQDN} is the host associated with your ingress

  1. -

You can get it by running kubectl get ingress

  1. -

Follow the instructions to monitor thedeployment

  1. -

It can take 10-20 minutes for the endpoint to become fully available

Next steps

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 04.03.2020: Clarified that GCP basic auth is not supported and removed most references (#1765) (8703f266)