Deploy Transformer with InferenceService

Transformer is an InferenceService component which does pre/post processing alongside with model inference. It usually takes raw input and transforms them to the input tensors model server expects. In this example we demonstrate an example of running inference with Transformer and TorchServe predictor.

Build Transformer image

KServe.KFModel base class mainly defines three handlers preprocess, predict and postprocess, these handlers are executed in sequence, the output of the preprocess is passed to predict as the input, when predictor_host is passed the predict handler by default makes a HTTP call to the predictor url and gets back a response which then passes to postproces handler. KServe automatically fills in the predictor_host for Transformer and handle the call to the Predictor, for gRPC predictor currently you would need to overwrite the predict handler to make the gRPC call.

To implement a Transformer you can derive from the base KFModel class and then overwrite the preprocess and postprocess handler to have your own customized transformation logic.

Extend KFModel and implement pre/post processing functions

  1. import kserve
  2. from typing import List, Dict
  3. from PIL import Image
  4. import torchvision.transforms as transforms
  5. import logging
  6. import io
  7. import numpy as np
  8. import base64
  9. logging.basicConfig(level=kserve.constants.KSERVE_LOGLEVEL)
  10. transform = transforms.Compose(
  11. [transforms.ToTensor(),
  12. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
  13. def image_transform(instance):
  14. byte_array = base64.b64decode(instance['image_bytes']['b64'])
  15. image = Image.open(io.BytesIO(byte_array))
  16. a = np.asarray(image)
  17. im = Image.fromarray(a)
  18. res = transform(im)
  19. logging.info(res)
  20. return res.tolist()
  21. class ImageTransformer(kserve.KFModel):
  22. def __init__(self, name: str, predictor_host: str):
  23. super().__init__(name)
  24. self.predictor_host = predictor_host
  25. def preprocess(self, inputs: Dict) -> Dict:
  26. return {'instances': [image_transform(instance) for instance in inputs['instances']]}
  27. def postprocess(self, inputs: Dict) -> Dict:
  28. return inputs

Please see the code example here

Build Transformer docker image

  1. docker build -t {username}/image-transformer:latest -f transformer.Dockerfile .
  2. docker push {username}/image-transformer:latest

Create the InferenceService

Please use the YAML file to create the InferenceService, which includes a Transformer and a PyTorch Predictor.

By default InferenceService uses TorchServe to serve the PyTorch models and the models are loaded from a model repository in KServe example gcs bucket according to TorchServe model repository layout. The model repository contains a mnist model but you can store more than one models there. In the Transformer image you can create a tranformer class for all the models in the repository if they can share the same transformer or maintain a map from model name to transformer classes so KServe knows to use the transformer for the corresponding model.

  1. apiVersion: serving.kserve.io/v1beta1
  2. kind: InferenceService
  3. metadata:
  4. name: torchserve-transformer
  5. spec:
  6. transformer:
  7. containers:
  8. - image: kfserving/torchserve-image-transformer:latest
  9. name: kserve-container
  10. env:
  11. - name: STORAGE_URI
  12. value: gs://kfserving-examples/models/torchserve/image_classifier
  13. predictor:
  14. pytorch:
  15. storageUri: gs://kfserving-examples/models/torchserve/image_classifier

Note

STORAGE_URI environment variable is a build-in env to inject the storage initializer for custom container just like StorageURI field for prepackaged predictors and the downloaded artifacts are stored under /mnt/models.

Apply the CRD

  1. kubectl apply -f transformer.yaml

Expected Output

  1. $ inferenceservice.serving.kserve.io/torchserve-transformer created

Run a prediction

The first step is to determine the ingress IP and ports and set INGRESS_HOST and INGRESS_PORT

  1. SERVICE_NAME=torchserve-transformer
  2. MODEL_NAME=mnist
  3. INPUT_PATH=@./input.json
  4. SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  5. curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict

Expected Output

  1. > POST /v1/models/mnist:predict HTTP/1.1
  2. > Host: torchserve-transformer.default.example.com
  3. > User-Agent: curl/7.73.0
  4. > Accept: */*
  5. > Content-Length: 401
  6. > Content-Type: application/x-www-form-urlencoded
  7. >
  8. * upload completely sent off: 401 out of 401 bytes
  9. Handling connection for 8080
  10. * Mark bundle as not supporting multiuse
  11. < HTTP/1.1 200 OK
  12. < content-length: 20
  13. < content-type: application/json; charset=UTF-8
  14. < date: Tue, 12 Jan 2021 09:52:30 GMT
  15. < server: istio-envoy
  16. < x-envoy-upstream-service-time: 83
  17. <
  18. * Connection #0 to host localhost left intact
  19. {"predictions": [2]}