TensorFlow Serving

Serving TensorFlow models

Out of date

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

This Kubeflow component has stable status. See the Kubeflow versioning policies.

Serving a model

To deploy a model we create following resources as illustrated below

  • A deployment to deploy the model using TFServing
  • A K8s service to create an endpoint a service
  • An Istio virtual service to route traffic to the model and expose it through the Istio gateway
  • An Istio DestinationRule is for doing traffic splitting.
  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. app: mnist
  6. name: mnist-service
  7. namespace: kubeflow
  8. spec:
  9. ports:
  10. - name: grpc-tf-serving
  11. port: 9000
  12. targetPort: 9000
  13. - name: http-tf-serving
  14. port: 8500
  15. targetPort: 8500
  16. selector:
  17. app: mnist
  18. type: ClusterIP
  19. ---
  20. apiVersion: apps/v1
  21. kind: Deployment
  22. metadata:
  23. labels:
  24. app: mnist
  25. name: mnist-v1
  26. namespace: kubeflow
  27. spec:
  28. selector:
  29. matchLabels:
  30. app: mnist
  31. template:
  32. metadata:
  33. annotations:
  34. sidecar.istio.io/inject: "true"
  35. labels:
  36. app: mnist
  37. version: v1
  38. spec:
  39. containers:
  40. - args:
  41. - --port=9000
  42. - --rest_api_port=8500
  43. - --model_name=mnist
  44. - --model_base_path=YOUR_MODEL
  45. command:
  46. - /usr/bin/tensorflow_model_server
  47. image: tensorflow/serving:1.11.1
  48. imagePullPolicy: IfNotPresent
  49. livenessProbe:
  50. initialDelaySeconds: 30
  51. periodSeconds: 30
  52. tcpSocket:
  53. port: 9000
  54. name: mnist
  55. ports:
  56. - containerPort: 9000
  57. - containerPort: 8500
  58. resources:
  59. limits:
  60. cpu: "4"
  61. memory: 4Gi
  62. requests:
  63. cpu: "1"
  64. memory: 1Gi
  65. volumeMounts:
  66. - mountPath: /var/config/
  67. name: config-volume
  68. volumes:
  69. - configMap:
  70. name: mnist-v1-config
  71. name: config-volume
  72. ---
  73. apiVersion: networking.istio.io/v1alpha3
  74. kind: DestinationRule
  75. metadata:
  76. labels:
  77. name: mnist-service
  78. namespace: kubeflow
  79. spec:
  80. host: mnist-service
  81. subsets:
  82. - labels:
  83. version: v1
  84. name: v1
  85. ---
  86. apiVersion: networking.istio.io/v1alpha3
  87. kind: VirtualService
  88. metadata:
  89. labels:
  90. name: mnist-service
  91. namespace: kubeflow
  92. spec:
  93. gateways:
  94. - kubeflow-gateway
  95. hosts:
  96. - '*'
  97. http:
  98. - match:
  99. - method:
  100. exact: POST
  101. uri:
  102. prefix: /tfserving/models/mnist
  103. rewrite:
  104. uri: /v1/models/mnist:predict
  105. route:
  106. - destination:
  107. host: mnist-service
  108. port:
  109. number: 8500
  110. subset: v1
  111. weight: 100

Referring to the above example, you can customize your deployment by changing the following configurations in the YAML file:

  • In the deployment resource, the model_base_path argument points to the model. Change the value to your own model.

  • The example contains three configurations for Google Cloud Storage (GCS) access: volumes (secret user-gcp-sa), volumeMounts, and env (GOOGLE_APPLICATION_CREDENTIALS). If your model is not at GCS (e.g. using S3 from AWS), See the section below on how to setup access.

  • GPU. If you want to use GPU, add nvidia.com/gpu: 1 in container resources, and use a GPU image, for example: tensorflow/serving:1.11.1-gpu.

    1. resources:
    2. limits:
    3. cpu: "4"
    4. memory: 4Gi
    5. nvidia.com/gpu: 1
  • The resource VirtualService and DestinationRule are for routing. With the example above, the model is accessible at HOSTNAME/tfserving/models/mnist (HOSTNAME is your Kubeflow deployment hostname). To change the path, edit the http.match.uri of VirtualService.

Pointing to the model

Depending where model file is located, set correct parameters

Google cloud

Change the deployment spec as follows:

  1. spec:
  2. selector:
  3. matchLabels:
  4. app: mnist
  5. template:
  6. metadata:
  7. annotations:
  8. sidecar.istio.io/inject: "true"
  9. labels:
  10. app: mnist
  11. version: v1
  12. spec:
  13. containers:
  14. - args:
  15. - --port=9000
  16. - --rest_api_port=8500
  17. - --model_name=mnist
  18. - --model_base_path=gs://kubeflow-examples-data/mnist
  19. command:
  20. - /usr/bin/tensorflow_model_server
  21. env:
  22. - name: GOOGLE_APPLICATION_CREDENTIALS
  23. value: /secret/gcp-credentials/user-gcp-sa.json
  24. image: tensorflow/serving:1.11.1-gpu
  25. imagePullPolicy: IfNotPresent
  26. livenessProbe:
  27. initialDelaySeconds: 30
  28. periodSeconds: 30
  29. tcpSocket:
  30. port: 9000
  31. name: mnist
  32. ports:
  33. - containerPort: 9000
  34. - containerPort: 8500
  35. resources:
  36. limits:
  37. cpu: "4"
  38. memory: 4Gi
  39. nvidia.com/gpu: 1
  40. requests:
  41. cpu: "1"
  42. memory: 1Gi
  43. volumeMounts:
  44. - mountPath: /var/config/
  45. name: config-volume
  46. - mountPath: /secret/gcp-credentials
  47. name: gcp-credentials
  48. volumes:
  49. - configMap:
  50. name: mnist-v1-config
  51. name: config-volume
  52. - name: gcp-credentials
  53. secret:
  54. secretName: user-gcp-sa

The changes are:

  • environment variable GOOGLE_APPLICATION_CREDENTIALS
  • volume gcp-credentials
  • volumeMount gcp-credentials

We need a service account that can access the model. If you are using Kubeflow’s click-to-deploy app, there should be already a secret, user-gcp-sa, in the cluster.

The model at gs://kubeflow-examples-data/mnist is publicly accessible. However, if your environment doesn’t have google cloud credential setup, TF serving will not be able to read the model. See this issue for example. To setup the google cloud credential, you should either have the environment variable GOOGLE_APPLICATION_CREDENTIALS pointing to the credential file, or run gcloud auth login. See doc for more detail.

S3

To use S3, first you need to create secret that will contain access credentials. Use base64 to encode your credentials and check details in the Kubernetes guide to creating a secret manually

  1. apiVersion: v1
  2. metadata:
  3. name: secretname
  4. data:
  5. AWS_ACCESS_KEY_ID: bmljZSB0cnk6KQ==
  6. AWS_SECRET_ACCESS_KEY: YnV0IHlvdSBkaWRuJ3QgZ2V0IG15IHNlY3JldCE=
  7. kind: Secret

Then use the following manifest as an example:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. labels:
  5. app: s3
  6. name: s3
  7. namespace: kubeflow
  8. spec:
  9. selector:
  10. matchLabels:
  11. app: mnist
  12. template:
  13. metadata:
  14. annotations:
  15. sidecar.istio.io/inject: null
  16. labels:
  17. app: s3
  18. version: v1
  19. spec:
  20. containers:
  21. - args:
  22. - --port=9000
  23. - --rest_api_port=8500
  24. - --model_name=s3
  25. - --model_base_path=s3://abc
  26. - --monitoring_config_file=/var/config/monitoring_config.txt
  27. command:
  28. - /usr/bin/tensorflow_model_server
  29. env:
  30. - name: AWS_ACCESS_KEY_ID
  31. valueFrom:
  32. secretKeyRef:
  33. key: AWS_ACCESS_KEY_ID
  34. name: secretname
  35. - name: AWS_SECRET_ACCESS_KEY
  36. valueFrom:
  37. secretKeyRef:
  38. key: AWS_SECRET_ACCESS_KEY
  39. name: secretname
  40. - name: AWS_REGION
  41. value: us-west-1
  42. - name: S3_USE_HTTPS
  43. value: "true"
  44. - name: S3_VERIFY_SSL
  45. value: "true"
  46. - name: S3_ENDPOINT
  47. value: s3.us-west-1.amazonaws.com
  48. image: tensorflow/serving:1.11.1
  49. imagePullPolicy: IfNotPresent
  50. livenessProbe:
  51. initialDelaySeconds: 30
  52. periodSeconds: 30
  53. tcpSocket:
  54. port: 9000
  55. name: s3
  56. ports:
  57. - containerPort: 9000
  58. - containerPort: 8500
  59. resources:
  60. limits:
  61. cpu: "4"
  62. memory: 4Gi
  63. requests:
  64. cpu: "1"
  65. memory: 1Gi
  66. volumeMounts:
  67. - mountPath: /var/config/
  68. name: config-volume
  69. volumes:
  70. - configMap:
  71. name: s3-config
  72. name: config-volume

Sending prediction request directly

If the service type is LoadBalancer, it will have its own accessible external ip. Get the external ip by:

  1. kubectl get svc mnist-service

And then send the request

  1. curl -X POST -d @input.json http://EXTERNAL_IP:8500/v1/models/mnist:predict

Sending prediction request through ingress and IAP

If the service type is ClusterIP, you can access through ingress. It’s protected and only one with right credentials can access the endpoint. Below shows how to programmatically authenticate a service account to access IAP.

  1. Save the client ID that you used to deploy Kubeflow as IAP_CLIENT_ID.
  2. Create a service account

    1. gcloud iam service-accounts create --project=$PROJECT $SERVICE_ACCOUNT
  3. Grant the service account access to IAP enabled resources:

    1. gcloud projects add-iam-policy-binding $PROJECT \
    2. --role roles/iap.httpsResourceAccessor \
    3. --member serviceAccount:$SERVICE_ACCOUNT
  4. Download the service account key:

    1. gcloud iam service-accounts keys create ${KEY_FILE} \
    2. --iam-account ${SERVICE_ACCOUNT}@${PROJECT}.iam.gserviceaccount.com
  5. Export the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the key file of the service account.

Finally, you can send the request with an input file with this python script

  1. python iap_request.py https://YOUR_HOST/tfserving/models/mnist IAP_CLIENT_ID --input=YOUR_INPUT_FILE

To send a GET request:

  1. python iap_request.py https://YOUR_HOST/models/MODEL_NAME/ IAP_CLIENT_ID

Telemetry and Rolling out model using Istio

Please look at the Istio guide.

Logs and metrics with Stackdriver

See the guide to logging and monitoring for instructions on getting logs and metrics using Stackdriver.

Last modified 21.04.2021: Components v external add ons (#2630) (42f08be0)