Predict on a InferenceService with a saved model on S3

Create S3 Secret and attach to Service Account

Create a secret with your S3 user credential, KServe reads the secret annotations to inject the S3 environment variables on storage initializer or model agent to download the models from S3 storage.

Create S3 secret

yaml

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: s3creds
  5. annotations:
  6. serving.kserve.io/s3-endpoint: s3.amazonaws.com # replace with your s3 endpoint e.g minio-service.kubeflow:9000
  7. serving.kserve.io/s3-usehttps: "1" # by default 1, if testing with minio you can set to 0
  8. serving.kserve.io/s3-region: "us-east-2"
  9. serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
  10. type: Opaque
  11. stringData: # use `stringData` for raw credential string or `data` for base64 encoded string
  12. AWS_ACCESS_KEY_ID: XXXX
  13. AWS_SECRET_ACCESS_KEY: XXXXXXXX

Attach secret to a service account

yaml

  1. apiVersion: v1
  2. kind: ServiceAccount
  3. metadata:
  4. name: sa
  5. secrets:
  6. - name: s3creds

kubectl

  1. kubectl apply -f create-s3-secret.yaml

Note

If you are running kserve with istio sidecars enabled, there can be a race condition between the istio proxy being ready and the agent pulling models. This will result in a tcp dial connection refused error when the agent tries to download from s3.

To resolve it, istio allows the blocking of other containers in a pod until the proxy container is ready.

You can enabled this by setting proxy.holdApplicationUntilProxyStarts: true in istio-sidecar-injector configmap, proxy.holdApplicationUntilProxyStarts flag was introduced in Istio 1.7 as an experimental feature and is turned off by default.

Deploy the model on S3 with InferenceService

Create the InferenceService with the s3 storageUri and the service account with s3 credential attached.

yaml

  1. apiVersion: "serving.kserve.io/v1beta1"
  2. kind: "InferenceService"
  3. metadata:
  4. name: "mnist-s3"
  5. spec:
  6. predictor:
  7. serviceAccountName: sa
  8. tensorflow:
  9. storageUri: "s3://kserve-examples/mnist"

kubectl

  1. kubectl apply -f mnist-s3.yaml

Run a prediction

Now, the ingress can be accessed at ${INGRESS_HOST}:${INGRESS_PORT} or follow this instruction to find out the ingress IP and port.

  1. SERVICE_HOSTNAME=$(kubectl get inferenceservice mnist-s3 -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  2. MODEL_NAME=mnist-s3
  3. INPUT_PATH=@./input.json
  4. curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

  1. Note: Unnecessary use of -X or --request, POST is already inferred.
  2. * Trying 35.237.217.209...
  3. * TCP_NODELAY set
  4. * Connected to mnist-s3.default.35.237.217.209.xip.io (35.237.217.209) port 80 (#0)
  5. > POST /v1/models/mnist-s3:predict HTTP/1.1
  6. > Host: mnist-s3.default.35.237.217.209.xip.io
  7. > User-Agent: curl/7.55.1
  8. > Accept: */*
  9. > Content-Length: 2052
  10. > Content-Type: application/x-www-form-urlencoded
  11. > Expect: 100-continue
  12. >
  13. < HTTP/1.1 100 Continue
  14. * We are completely uploaded and fine
  15. < HTTP/1.1 200 OK
  16. < content-length: 251
  17. < content-type: application/json
  18. < date: Sun, 04 Apr 2021 20:06:27 GMT
  19. < x-envoy-upstream-service-time: 5
  20. < server: istio-envoy
  21. <
  22. {
  23. "predictions": [
  24. {
  25. "predictions": [0.327352405, 2.00153053e-07, 0.0113353515, 0.203903764, 3.62863029e-05, 0.416683704, 0.000281196437, 8.36911859e-05, 0.0403052084, 1.82206513e-05],
  26. "classes": 5
  27. }
  28. ]
  29. }* Connection #0 to host mnist-s3.default.35.237.217.209.xip.io left intact