KServe Debugging Guide

Debug KServe InferenceService Status

You deployed an InferenceService to KServe, but it is not in ready state. Go through this step by step guide to understand what failed.

  1. kubectl get inferenceservices sklearn-iris
  2. NAME URL READY DEFAULT TRAFFIC CANARY TRAFFIC AGE
  3. model-example False 1m

IngressNotConfigured

If you see IngressNotConfigured error, this indicates Istio Ingress Gateway probes are failing.

  1. kubectl get ksvc
  2. NAME URL LATESTCREATED LATESTREADY READY REASON
  3. sklearn-iris-predictor-default http://sklearn-iris-predictor-default.default.example.com sklearn-iris-predictor-default-jk794 mnist-sample-predictor-default-jk794 Unknown IngressNotConfigured

You can then check Knative networking-istio pod logs for more details.

  1. kubectl logs -l app=networking-istio -n knative-serving

If you are seeing HTTP 403, then you may have Istio RBAC turned on which blocks the probes to your service.

  1. {"level":"error","ts":"2020-03-26T19:12:00.749Z","logger":"istiocontroller.ingress-controller.status-manager","caller":"ingress/status.go:366",
  2. "msg":"Probing of http://flowers-sample-predictor-default.kubeflow-jeanarmel-luce.example.com:80/ failed, IP: 10.0.0.29:80, ready: false, error: unexpected status code: want [200], got 403 (depth: 0)",
  3. "commit":"6b0e5c6","knative.dev/controller":"ingress-controller","stacktrace":"knative.dev/serving/pkg/reconciler/ingress.(*StatusProber).processWorkItem\n\t/home/prow/go/src/knative.dev/serving/pkg/reconciler/ingress/status.go:366\nknative.dev/serving/pkg/reconciler/ingress.(*StatusProber).Start.func1\n\t/home/prow/go/src/knative.dev/serving/pkg/reconciler/ingress/status.go:268"}

RevisionMissing Error

If you see RevisionMissing error, then your service pods are not in ready state. Knative Service creates Knative Revision which represents a snapshot of the InferenceService code and configuration.

Storage Initializer fails to download model

  1. kubectl get revision $(kubectl get configuration sklearn-iris-predictor-default --output jsonpath="{.status.latestCreatedRevisionName}")
  2. NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON
  3. sklearn-iris-predictor-default-csjpw sklearn-iris-predictor-default sklearn-iris-predictor-default-csjpw 2 Unknown Deploying

If you see READY status in Unknown error, this usually indicates that the KServe Storage Initializer init container fails to download the model and you can check the init container logs to see why it fails, note that the pod scales down after sometime if the init container fails.

  1. kubectl get pod -l serving.kserve.io/inferenceservice=sklearn-iris
  2. NAME READY STATUS RESTARTS AGE
  3. sklearn-iris-predictor-default-29jks-deployment-5f7d4b9996hzrnc 0/3 Init:Error 1 10s
  4. kubectl logs -l model=sklearn-iris -c storage-initializer
  5. [I 200517 03:56:19 initializer-entrypoint:13] Initializing, args: src_uri [gs://kfserving-samples/models/sklearn/iris-1] dest_path[ [/mnt/models]
  6. [I 200517 03:56:19 storage:35] Copying contents of gs://kfserving-samples/models/sklearn/iris-1 to local
  7. Traceback (most recent call last):
  8. File "/storage-initializer/scripts/initializer-entrypoint", line 14, in <module>
  9. kserve.Storage.download(src_uri, dest_path)
  10. File "/usr/local/lib/python3.7/site-packages/kfserving/storage.py", line 48, in download
  11. Storage._download_gcs(uri, out_dir)
  12. File "/usr/local/lib/python3.7/site-packages/kfserving/storage.py", line 116, in _download_gcs
  13. The path or model %s does not exist." % (uri))
  14. RuntimeError: Failed to fetch model. The path or model gs://kfserving-samples/models/sklearn/iris-1 does not exist.
  15. [I 200517 03:40:19 initializer-entrypoint:13] Initializing, args: src_uri [gs://kfserving-samples/models/sklearn/iris] dest_path[ [/mnt/models]
  16. [I 200517 03:40:19 storage:35] Copying contents of gs://kfserving-samples/models/sklearn/iris to local
  17. [I 200517 03:40:20 storage:111] Downloading: /mnt/models/model.joblib
  18. [I 200517 03:40:20 storage:60] Successfully copied gs://kfserving-samples/models/sklearn/iris to /mnt/models

Inference Service in OOM status

If you see ExitCode137 from the revision status, this means the revision has failed and this usually happens when the inference service pod is out of memory. To address it, you might need to bump up the memory limit of the InferenceService.

  1. kubectl get revision $(kubectl get configuration sklearn-iris-predictor-default --output jsonpath="{.status.latestCreatedRevisionName}")
  2. NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON
  3. sklearn-iris-predictor-default-84bzf sklearn-iris-predictor-default sklearn-iris-predictor-default-84bzf 8 False ExitCode137s

Inference Service fails to start

If you see other exit codes from the revision status you can further check the pod status.

  1. kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris
  2. sklearn-iris-predictor-default-rvhmk-deployment-867c6444647tz7n 1/3 CrashLoopBackOff 3 80s

If you see the CrashLoopBackOff, then check the kserve-container log to see more details where it fails, the error log is usually propagated on revision container status also.

  1. kubectl logs sklearn-iris-predictor-default-rvhmk-deployment-867c6444647tz7n kserve-container
  2. [I 200517 04:58:21 storage:35] Copying contents of /mnt/models to local
  3. Traceback (most recent call last):
  4. File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
  5. "__main__", mod_spec)
  6. File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
  7. exec(code, run_globals)
  8. File "/sklearnserver/sklearnserver/__main__.py", line 33, in <module>
  9. model.load()
  10. File "/sklearnserver/sklearnserver/model.py", line 36, in load
  11. model_file = next(path for path in paths if os.path.exists(path))
  12. StopIteration

Inference Service cannot fetch docker images from AWS ECR

If you don’t see the inference service created at all for custom images from private registries (such as AWS ECR), it might be that the Knative Serving Controller fails to authenticate itself against the registry.

  1. failed to resolve image to digest: failed to fetch image information: unsupported status code 401; body: Not Authorized

You can verify that this is actually the case by spinning up a pod that uses your image. The pod should be able to fetch it, if the correct IAM roles are attached, while Knative is not able to. To circumvent this issue you can either skip tag resolution or provide certificates for your registry as detailed in the official knative docs.

  1. kubectl -n knative-serving edit configmap config-deployment

The resultant yaml will look like something below.

  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: config-deployment
  5. namespace: knative-serving
  6. data:
  7. # List of repositories for which tag to digest resolving should be skipped (for AWS ECR: {account_id}.dkr.ecr.{region}.amazonaws.com)
  8. registriesSkippingTagResolving: registry.example.com

Debug KServe Request flow

  1. +----------------------+ +-----------------------+ +--------------------------+
  2. |Istio Virtual Service | |Istio Virtual Service | | K8S Service |
  3. | | | | | |
  4. |sklearn-iris | |sklearn-iris-predictor | | sklearn-iris-predictor |
  5. | +------->|-default +----->| -default-$revision |
  6. | | | | | |
  7. |KServe Route | |Knative Route | | Knative Revision Service |
  8. +----------------------+ +-----------------------+ +------------+-------------+
  9. Knative Ingress Gateway Knative Local Gateway Kube Proxy
  10. (Istio gateway) (Istio gateway) |
  11. |
  12. |
  13. +-------------------------------------------------------+ |
  14. | Knative Revision Pod | |
  15. | | |
  16. | +-------------------+ +-----------------+ | |
  17. | | | | | | |
  18. | |kserve-container |<-----+ Queue Proxy | |<------------------+
  19. | | | | | |
  20. | +-------------------+ +--------------^--+ |
  21. | | |
  22. +-----------------------^-------------------------------+
  23. | scale deployment |
  24. +--------+--------+ | pull metrics
  25. | Knative | |
  26. | Autoscaler |-----------
  27. | KPA/HPA |
  28. +-----------------+

1.Traffic arrives through Knative Ingress/Local Gateway for external/internal traffic

Istio Gateway resource describes the edge of the mesh receiving incoming or outgoing HTTP/TCP connections. The specification describes a set of ports that should be exposed and the type of protocol to use. If you are using Standalone mode, it installs the Gateway in knative-serving namespace, if you are using Kubeflow KServe(KServe installed with Kubeflow), it installs the Gateway in kubeflow namespace e.g on GCP the gateway is protected behind IAP with Istio authentication policy.

  1. kubectl get gateway knative-ingress-gateway -n knative-serving -oyaml
  1. kind: Gateway
  2. metadata:
  3. labels:
  4. networking.knative.dev/ingress-provider: istio
  5. serving.knative.dev/release: v0.12.1
  6. name: knative-ingress-gateway
  7. namespace: knative-serving
  8. spec:
  9. selector:
  10. istio: ingressgateway
  11. servers:
  12. - hosts:
  13. - '*'
  14. port:
  15. name: http
  16. number: 80
  17. protocol: HTTP
  18. - hosts:
  19. - '*'
  20. port:
  21. name: https
  22. number: 443
  23. protocol: HTTPS
  24. tls:
  25. mode: SIMPLE
  26. privateKey: /etc/istio/ingressgateway-certs/tls.key
  27. serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

The InferenceService request routes to the Istio Ingress Gateway by matching the host and port from the url, by default http is configured, you can configure HTTPS with TLS certificates.

2. KServe Istio virtual service to route for predictor, transformer, explainer.

  1. kubectl get vs sklearn-iris -oyaml
  1. apiVersion: networking.istio.io/v1alpha3
  2. kind: VirtualService
  3. metadata:
  4. name: sklearn-iris
  5. namespace: default
  6. gateways:
  7. - knative-serving/knative-local-gateway
  8. - knative-serving/knative-ingress-gateway
  9. hosts:
  10. - sklearn-iris.default.svc.cluster.local
  11. - sklearn-iris.default.example.com
  12. http:
  13. - headers:
  14. request:
  15. set:
  16. Host: sklearn-iris-predictor-default.default.svc.cluster.local
  17. match:
  18. - authority:
  19. regex: ^sklearn-iris\.default(\.svc(\.cluster\.local)?)?(?::\d{1,5})?$
  20. gateways:
  21. - knative-serving/knative-local-gateway
  22. - authority:
  23. regex: ^sklearn-iris\.default\.example\.com(?::\d{1,5})?$
  24. gateways:
  25. - knative-serving/knative-ingress-gateway
  26. route:
  27. - destination:
  28. host: knative-local-gateway.istio-system.svc.cluster.local
  29. port:
  30. number: 80
  31. weight: 100

KServe creates the routing rule which by default routes to Predictor if you only have Predictor specified on InferenceService. When Transformer and Explainer are specified on InferenceService the routing rule configures the traffic to route to Transformer or Explainer based on the verb. The request then routes to the second level Knative created virtual service via local gateway with the matching host header.

3. Knative Istio virtual service to route the inference request to the latest ready revision.

  1. kubectl get vs sklearn-iris-predictor-default-ingress -oyaml
  1. apiVersion: networking.istio.io/v1alpha3
  2. kind: VirtualService
  3. metadata:
  4. name: sklearn-iris-predictor-default-mesh
  5. namespace: default
  6. spec:
  7. gateways:
  8. - knative-serving/knative-ingress-gateway
  9. - knative-serving/knative-local-gateway
  10. hosts:
  11. - sklearn-iris-predictor-default.default
  12. - sklearn-iris-predictor-default.default.example.com
  13. - sklearn-iris-predictor-default.default.svc
  14. - sklearn-iris-predictor-default.default.svc.cluster.local
  15. http:
  16. - match:
  17. - authority:
  18. prefix: sklearn-iris-predictor-default.default
  19. gateways:
  20. - knative-serving/knative-local-gateway
  21. - authority:
  22. prefix: sklearn-iris-predictor-default.default.svc
  23. gateways:
  24. - knative-serving/knative-local-gateway
  25. - authority:
  26. prefix: sklearn-iris-predictor-default.default
  27. gateways:
  28. - knative-serving/knative-local-gateway
  29. retries: {}
  30. route:
  31. - destination:
  32. host: sklearn-iris-predictor-default-00001.default.svc.cluster.local
  33. port:
  34. number: 80
  35. headers:
  36. request:
  37. set:
  38. Knative-Serving-Namespace: default
  39. Knative-Serving-Revision: sklearn-iris-predictor-default-00001
  40. weight: 100
  41. - match:
  42. - authority:
  43. prefix: sklearn-iris-predictor-default.default.example.com
  44. gateways:
  45. - knative-serving/knative-ingress-gateway
  46. retries: {}
  47. route:
  48. - destination:
  49. host: sklearn-iris-predictor-default-00001.default.svc.cluster.local
  50. port:
  51. number: 80
  52. headers:
  53. request:
  54. set:
  55. Knative-Serving-Namespace: default
  56. Knative-Serving-Revision: sklearn-iris-predictor-default-00001
  57. weight: 100

The destination here is the k8s Service for the latest ready Knative Revision and it is reconciled by Knative every time user rolls out a new revision. When a new revision is rolled out and in ready state, the old revision is then scaled down, after configured revision GC time the revision resource is garbage collected if the revision no longer has traffic referenced.

4. Kubernetes Service routes the requests to the queue proxy sidecar of the inference service pod on port 8012.

  1. kubectl get svc sklearn-iris-predictor-default-fhmjk-private -oyaml
  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. name: sklearn-iris-predictor-default-fhmjk-private
  5. namespace: default
  6. spec:
  7. clusterIP: 10.105.186.18
  8. ports:
  9. - name: http
  10. port: 80
  11. protocol: TCP
  12. targetPort: 8012
  13. - name: queue-metrics
  14. port: 9090
  15. protocol: TCP
  16. targetPort: queue-metrics
  17. - name: http-usermetric
  18. port: 9091
  19. protocol: TCP
  20. targetPort: http-usermetric
  21. - name: http-queueadm
  22. port: 8022
  23. protocol: TCP
  24. targetPort: 8022
  25. selector:
  26. serving.knative.dev/revisionUID: a8f1eafc-3c64-4930-9a01-359f3235333a
  27. sessionAffinity: None
  28. type: ClusterIP

5. The queue proxy routes to kserve container with max concurrent requests configured with ContainerConcurrency.

If the queue proxy has more requests than it can handle, the Knative Autoscaler creates more pods to handle additional requests.

6. Finally The queue proxy routes traffic to the kserve-container for processing the inference requests.