First InferenceService

Run your first InferenceService

In this tutorial, you will deploy a ScikitLearn InferenceService.

This inference service loads a simple iris ML model, send a list of attributes and print the prediction for the class of iris plant.”

Since your model is being deployed as an InferenceService, not a raw Kubernetes Service, you just need to provide the trained model and it gets some super powers out of the box 🚀.

1. Create test InferenceService

YAML

  1. apiVersion: "serving.kserve.io/v1beta1"
  2. kind: "InferenceService"
  3. metadata:
  4. name: "sklearn-iris"
  5. spec:
  6. predictor:
  7. sklearn:
  8. storageUri: "gs://kfserving-samples/models/sklearn/iris"

Once you’ve created your YAML file (named something like “sklearn.yaml”):

  1. kubectl create namespace kserve-test
  2. kubectl apply -f sklearn.yaml -n kserve-test

2. Check InferenceService status.

  1. kubectl get inferenceservices sklearn-iris -n kserve-test
  2. NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
  3. sklearn-iris http://sklearn-iris.kserve-test.example.com True 100 sklearn-iris-predictor-default-47q2g 7d23h

If your DNS contains example.com please consult your admin for configuring DNS or using custom domain.

3. Determine the ingress IP and ports

Execute the following command to determine if your kubernetes cluster is running in an environment that supports external load balancers

  1. $ kubectl get svc istio-ingressgateway -n istio-system
  2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  3. istio-ingressgateway LoadBalancer 172.21.109.129 130.211.10.121 ... 17h

Load Balancer

If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway.

  1. export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  2. export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')

Node Port

If the EXTERNAL-IP value is none (or perpetually pending), your environment does not provide an external load balancer for the ingress gateway. In this case, you can access the gateway using the service’s node port.

  1. # GKE
  2. export INGRESS_HOST=worker-node-address
  3. # Minikube
  4. export INGRESS_HOST=$(minikube ip)
  5. # Other environment(On Prem)
  6. export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}')
  7. export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')

Port Forward

Alternatively you can do Port Forward for testing purpose

  1. INGRESS_GATEWAY_SERVICE=$(kubectl get svc --namespace istio-system --selector="app=istio-ingressgateway" --output jsonpath='{.items[0].metadata.name}')
  2. kubectl port-forward --namespace istio-system svc/${INGRESS_GATEWAY_SERVICE} 8080:80
  3. # start another terminal
  4. export INGRESS_HOST=localhost
  5. export INGRESS_PORT=8080

4. Curl the InferenceService

First prepare your inference input request

  1. {
  2. "instances": [
  3. [6.8, 2.8, 4.8, 1.4],
  4. [6.0, 3.4, 4.5, 1.6]
  5. ]
  6. }

Once you’ve created your json test input file (named something like “iris-input.json”):

Real DNS

If you have configured the DNS, you can directly curl the InferenceService with the URL obtained from the status print. e.g

  1. curl -v http://sklearn-iris.kserve-test.${CUSTOM_DOMAIN}/v1/models/sklearn-iris:predict -d @./iris-input.json

Magic DNS

If you don’t want to go through the trouble to get a real domain, you can instead use “magic” dns xip.io. The key is to get the external IP for your cluster.

  1. kubectl get svc istio-ingressgateway --namespace istio-system

Look for the EXTERNAL-IP column’s value(in this case 35.237.217.209)

  1. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  2. istio-ingressgateway LoadBalancer 10.51.253.94 35.237.217.209

Next step is to setting up the custom domain:

  1. kubectl edit cm config-domain --namespace knative-serving

Now in your editor, change example.com to {{external-ip}}.xip.io (make sure to replace {{external-ip}} with the IP you found earlier).

With the change applied you can now directly curl the URL

  1. curl -v http://sklearn-iris.kserve-test.35.237.217.209.xip.io/v1/models/sklearn-iris:predict -d @./iris-input.json

From Ingress gateway with HOST Header

If you do not have DNS, you can still curl with the ingress gateway external IP using the HOST Header.

  1. SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -n kserve-test -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  2. curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/sklearn-iris:predict -d @./iris-input.json

From local cluster gateway

If you are calling from in cluster you can curl with the internal url with host {{InferenceServiceName}}.{{namespace}}

  1. curl -v http://sklearn-iris.kserve-test/v1/models/sklearn-iris:predict -d @./iris-input.json

5. Run Performance Test

  1. # use kubectl create instead of apply because the job template is using generateName which doesn't work with kubectl apply
  2. kubectl create -f https://raw.githubusercontent.com/kserve/kserve/release-0.7/docs/samples/v1beta1/sklearn/v1/perf.yaml -n kserve-test

Expected Outpout

  1. kubectl logs load-test8b58n-rgfxr -n kserve-test
  2. Requests [total, rate, throughput] 30000, 500.02, 499.99
  3. Duration [total, attack, wait] 1m0s, 59.998s, 3.336ms
  4. Latencies [min, mean, 50, 90, 95, 99, max] 1.743ms, 2.748ms, 2.494ms, 3.363ms, 4.091ms, 7.749ms, 46.354ms
  5. Bytes In [total, mean] 690000, 23.00
  6. Bytes Out [total, mean] 2460000, 82.00
  7. Success [ratio] 100.00%
  8. Status Codes [code:count] 200:30000
  9. Error Set: