Predict on an InferenceService with a saved model from a URI

This doc guides to specify a model object via the URI (Uniform Resource Identifier) of the model object exposed via an http or https endpoint.

This storageUri option supports single file models, like sklearn which is specified by a joblib file, or artifacts (e.g. tar or zip) which contain all the necessary dependencies for other model types (e.g. tensorflow or pytorch). Here, we’ll show examples from both of the above.

Create HTTP/HTTPS header Secret and attach to Service account

The HTTP/HTTPS service request headers can be defined as secret and attached to service account. This is optional.

yaml

  1. apiVersion: v1
  2. kind: Secret
  3. metadata:
  4. name: mysecret
  5. type: Opaque
  6. data:
  7. https-host: ZXhhbXBsZS5jb20=
  8. headers: |-
  9. ewoiYWNjb3VudC1uYW1lIjogInNvbWVfYWNjb3VudF9uYW1lIiwKInNlY3JldC1rZXkiOiAic29tZV9zZWNyZXRfa2V5Igp9
  10. ---
  11. apiVersion: v1
  12. kind: ServiceAccount
  13. metadata:
  14. name: sa
  15. secrets:
  16. - name: mysecret

kubectl

  1. kubectl apply -f create-uri-secret.yaml

Note

The serviceAccountName specified in your predictor in your inference service. These headers will be applied to any http/https requests that have the same host.

The header and host should be base64 encoded format.

  1. example.com
  2. # echo -n "example.com" | base64
  3. ZXhhbXBsZS5jb20=
  4. ---
  5. {
  6. "account-name": "some_account_name",
  7. "secret-key": "some_secret_key"
  8. }
  9. # echo -n '{\n"account-name": "some_account_name",\n"secret-key": "some_secret_key"\n}' | base64
  10. ewoiYWNjb3VudC1uYW1lIjogInNvbWVfYWNjb3VudF9uYW1lIiwKInNlY3JldC1rZXkiOiAic29tZV9zZWNyZXRfa2V5Igp9

Sklearn

Train and freeze the model

Here, we’ll train a simple iris model. Please note that KServe requires sklearn==0.20.3.

python

  1. from sklearn import svm
  2. from sklearn import datasets
  3. import joblib
  4. def train(X, y):
  5. clf = svm.SVC(gamma='auto')
  6. clf.fit(X, y)
  7. return clf
  8. def freeze(clf, path='../frozen'):
  9. joblib.dump(clf, f'{path}/model.joblib')
  10. return True
  11. if __name__ == '__main__':
  12. iris = datasets.load_iris()
  13. X, y = iris.data, iris.target
  14. clf = train(X, y)
  15. freeze(clf)

Now, the frozen model object can be put it somewhere on the web to expose it. For instance, pushing the model.joblib file to some repo on GitHub.

Specify and create the InferenceService

yaml

  1. apiVersion: serving.kserve.io/v1beta1
  2. kind: InferenceService
  3. metadata:
  4. name: sklearn-from-uri
  5. spec:
  6. predictor:
  7. sklearn:
  8. storageUri: https://github.com/tduffy000/kfserving-uri-examples/blob/master/sklearn/frozen/model.joblib?raw=true

kubectl

  1. kubectl apply -f sklearn-from-uri.yaml

Run a prediction

Now, the ingress can be accessed at ${INGRESS_HOST}:${INGRESS_PORT} or follow this instruction to find out the ingress IP and port.

An example payload below:

  1. {
  2. "instances": [
  3. [6.8, 2.8, 4.8, 1.4],
  4. [6.0, 3.4, 4.5, 1.6]
  5. ]
  6. }
  1. SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-from-uri -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  2. MODEL_NAME=sklearn-from-uri
  3. INPUT_PATH=@./input.json
  4. curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

  1. $ * Trying 127.0.0.1:8080...
  2. * TCP_NODELAY set
  3. * Connected to localhost (127.0.0.1) port 8080 (#0)
  4. > POST /v1/models/sklearn-from-uri:predict HTTP/1.1
  5. > Host: sklearn-from-uri.default.example.com
  6. > User-Agent: curl/7.68.0
  7. > Accept: */*
  8. > Content-Length: 76
  9. > Content-Type: application/x-www-form-urlencoded
  10. >
  11. * upload completely sent off: 76 out of 76 bytes
  12. * Mark bundle as not supporting multiuse
  13. < HTTP/1.1 200 OK
  14. < content-length: 23
  15. < content-type: application/json; charset=UTF-8
  16. < date: Mon, 06 Sep 2021 15:52:55 GMT
  17. < server: istio-envoy
  18. < x-envoy-upstream-service-time: 7
  19. <
  20. * Connection #0 to host localhost left intact
  21. {"predictions": [1, 1]}

Tensorflow

This will serve as an example of the ability to also pull in a tarball containing all of the required model dependencies, for instance tensorflow requires multiple files in a strict directory structure in order to be servable.

Train and freeze the model

python

  1. from sklearn import datasets
  2. import numpy as np
  3. import tensorflow as tf
  4. def _ohe(targets):
  5. y = np.zeros((150, 3))
  6. for i, label in enumerate(targets):
  7. y[i, label] = 1.0
  8. return y
  9. def train(X, y, epochs, batch_size=16):
  10. model = tf.keras.Sequential([
  11. tf.keras.layers.InputLayer(input_shape=(4,)),
  12. tf.keras.layers.Dense(16, activation=tf.nn.relu),
  13. tf.keras.layers.Dense(16, activation=tf.nn.relu),
  14. tf.keras.layers.Dense(3, activation='softmax')
  15. ])
  16. model.compile(tf.keras.optimizers.RMSprop(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
  17. model.fit(X, y, epochs=epochs)
  18. return model
  19. def freeze(model, path='../frozen'):
  20. model.save(f'{path}/0001')
  21. return True
  22. if __name__ == '__main__':
  23. iris = datasets.load_iris()
  24. X, targets = iris.data, iris.target
  25. y = _ohe(targets)
  26. model = train(X, y, epochs=50)
  27. freeze(model)

The post-training procedure here is a bit different. Instead of directly pushing the frozen output to some URI, we’ll need to package them into a tarball. To do so,

  1. cd ../frozen
  2. tar -cvf artifacts.tar 0001/
  3. gzip < artifacts.tar > artifacts.tgz

Where we assume the 0001/ directory has the structure:

  1. |-- 0001/
  2. |-- saved_model.pb
  3. |-- variables/
  4. |--- variables.data-00000-of-00001
  5. |--- variables.index

Note

Building the tarball from the directory specifying a version number is required for tensorflow.

Specify and create the InferenceService

And again, if everything went to plan we should be able to pull down the tarball and expose the endpoint.

yaml

  1. apiVersion: serving.kserve.io/v1beta1
  2. kind: InferenceService
  3. metadata:
  4. name: tensorflow-from-uri-gzip
  5. spec:
  6. predictor:
  7. tensorflow:
  8. storageUri: https://raw.githubusercontent.com/tduffy000/kfserving-uri-examples/master/tensorflow/frozen/model_artifacts.tar.gz

kubectl

  1. kubectl apply -f tensorflow-from-uri-gzip.yaml

Run a prediction

Again, the ingress can be accessed at ${INGRESS_HOST}:${INGRESS_PORT} or follow this instruction to find out the ingress IP and port.

An example payload below:

  1. {
  2. "instances": [
  3. [6.8, 2.8, 4.8, 1.4],
  4. [6.0, 3.4, 4.5, 1.6]
  5. ]
  6. }
  1. SERVICE_HOSTNAME=$(kubectl get inferenceservice tensorflow-from-uri-gzip -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  2. MODEL_NAME=tensorflow-from-uri-gzip
  3. INPUT_PATH=@./input.json
  4. curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

  1. $ * Trying 10.0.1.16...
  2. * TCP_NODELAY set
  3. % Total % Received % Xferd Average Speed Time Time Time Current
  4. Dload Upload Total Spent Left Speed
  5. 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 10.0.1.16 (10.0.1.16) port 30749 (#0)
  6. > POST /v1/models/tensorflow-from-uri:predict HTTP/1.1
  7. > Host: tensorflow-from-uri.default.example.com
  8. > User-Agent: curl/7.58.0
  9. > Accept: */*
  10. > Content-Length: 86
  11. > Content-Type: application/x-www-form-urlencoded
  12. >
  13. } [86 bytes data]
  14. * upload completely sent off: 86 out of 86 bytes
  15. < HTTP/1.1 200 OK
  16. < content-length: 112
  17. < content-type: application/json
  18. < date: Thu, 06 Aug 2020 23:21:19 GMT
  19. < x-envoy-upstream-service-time: 151
  20. < server: istio-envoy
  21. <
  22. { [112 bytes data]
  23. 100 198 100 112 100 86 722 554 --:--:-- --:--:-- --:--:-- 1285
  24. * Connection #0 to host 10.0.1.16 left intact
  25. {
  26. "predictions": [
  27. [
  28. 0.0204100646,
  29. 0.680984616,
  30. 0.298605353
  31. ],
  32. [
  33. 0.0296604875,
  34. 0.658412039,
  35. 0.311927497
  36. ]
  37. ]
  38. }