Predict on an InferenceService with a saved model on PVC

This doc shows how to store a model in PVC and create InferenceService with a saved model on PVC.

Create PV and PVC

Refer to the document to create Persistent Volume (PV) and Persistent Volume Claim (PVC), the PVC will be used to store model. This document uses local PV.

yaml

  1. apiVersion: v1
  2. kind: PersistentVolume
  3. metadata:
  4. name: task-pv-volume
  5. labels:
  6. type: local
  7. spec:
  8. storageClassName: manual
  9. capacity:
  10. storage: 2Gi
  11. accessModes:
  12. - ReadWriteOnce
  13. hostPath:
  14. path: "/home/ubuntu/mnt/data"
  15. ---
  16. apiVersion: v1
  17. kind: PersistentVolumeClaim
  18. metadata:
  19. name: task-pv-claim
  20. spec:
  21. storageClassName: manual
  22. accessModes:
  23. - ReadWriteOnce
  24. resources:
  25. requests:
  26. storage: 1Gi

kubectl

  1. kubectl apply -f pv-and-pvc.yaml

Copy model to PV

Run pod model-store-pod and login into container model-store.

yaml

  1. apiVersion: v1
  2. kind: Pod
  3. metadata:
  4. name: model-store-pod
  5. spec:
  6. volumes:
  7. - name: model-store
  8. persistentVolumeClaim:
  9. claimName: task-pv-claim
  10. containers:
  11. - name: model-store
  12. image: ubuntu
  13. command: [ "sleep" ]
  14. args: [ "infinity" ]
  15. volumeMounts:
  16. - mountPath: "/pv"
  17. name: model-store
  18. resources:
  19. limits:
  20. memory: "1Gi"
  21. cpu: "1"

kubectl

  1. kubectl apply -f pv-model-store.yaml
  2. kubectl exec -it model-store-pod -- bash

In different terminal, copy the model from local into PV.

kubectl

  1. kubectl cp model.joblib model-store-pod:/pv/model.joblib -c model-store

Deploy InferenceService with models on PVC

Update the ${PVC_NAME} to the created PVC name and create the InferenceService with the PVC storageUri.

yaml

  1. apiVersion: "serving.kserve.io/v1beta1"
  2. kind: "InferenceService"
  3. metadata:
  4. name: "sklearn-pvc"
  5. spec:
  6. predictor:
  7. sklearn:
  8. storageUri: "pvc://${PVC_NAME}/model.joblib"

kubectl

  1. kubectl apply -f sklearn-pvc.yaml

Run a prediction

Now, the ingress can be accessed at ${INGRESS_HOST}:${INGRESS_PORT} or follow this instruction to find out the ingress IP and port.

  1. SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-pvc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  2. MODEL_NAME=sklearn-pvc
  3. INPUT_PATH=@./input.json
  4. curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict -d $INPUT_PATH

Expected Output

  1. * Trying 127.0.0.1:8080...
  2. * TCP_NODELAY set
  3. * Connected to localhost (127.0.0.1) port 8080 (#0)
  4. > POST /v1/models/sklearn-pvc:predict HTTP/1.1
  5. > Host: sklearn-pvc.default.example.com
  6. > User-Agent: curl/7.68.0
  7. > Accept: */*
  8. > Content-Length: 84
  9. > Content-Type: application/x-www-form-urlencoded
  10. >
  11. * upload completely sent off: 84 out of 84 bytes
  12. * Mark bundle as not supporting multiuse
  13. < HTTP/1.1 200 OK
  14. < content-length: 23
  15. < content-type: application/json; charset=UTF-8
  16. < date: Mon, 20 Sep 2021 04:55:50 GMT
  17. < server: istio-envoy
  18. < x-envoy-upstream-service-time: 6
  19. <
  20. * Connection #0 to host localhost left intact
  21. {"predictions": [1, 1]}