BentoML

Model serving with BentoML

Out of date

This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.

This guide demonstrates how to serve a scikit-learn based iris classifier model with BentoML on a Kubernetes cluster. The same deployment steps are also applicable for models trained with other machine learning frameworks, see more BentoML examples here.

BentoML is an open-source platform for high-performance ML model serving. It makes building production API endpoint for your ML model easy and supports all major machine learning training frameworks, including Tensorflow, Keras, PyTorch, XGBoost, scikit-learn and etc.

BentoML comes with a high-performance API model server with adaptive micro-batching support, which achieves the advantage of batch processing in online serving. It also provides model management and model deployment functionality, giving ML teams an end-to-end model serving workflow, with DevOps best practices baked in.

Prerequisites

Before starting this tutorial, make sure you have the following:

Build an iris classifier model server with BentoML

The following code defines a BentoML prediction service that requires a scikit-learn model, and asks BentoML to figure out the required PyPI packages automatically. It also defines an API, which is the entry point for accessing this prediction service. And the API is expecting a pandas.DataFrame object as its input data.

  1. # iris_classifier.py
  2. from bentoml import env, artifacts, api, BentoService
  3. from bentoml.handlers import DataframeHandler
  4. from bentoml.artifact import SklearnModelArtifact
  5. @env(auto_pip_dependencies=True)
  6. @artifacts([SklearnModelArtifact('model')])
  7. class IrisClassifier(BentoService):
  8. @api(DataframeHandler)
  9. def predict(self, df):
  10. return self.artifacts.model.predict(df)

The following code trains a classifier model and serves it with the IrisClassifier defined above:

  1. # main.py
  2. from sklearn import svm
  3. from sklearn import datasets
  4. from iris_classifier import IrisClassifier
  5. if __name__ == "__main__":
  6. # Load training data
  7. iris = datasets.load_iris()
  8. X, y = iris.data, iris.target
  9. # Model Training
  10. clf = svm.SVC(gamma='scale')
  11. clf.fit(X, y)
  12. # Create a iris classifier service instance
  13. iris_classifier_service = IrisClassifier()
  14. # Pack the newly trained model artifact
  15. iris_classifier_service.pack('model', clf)
  16. # Save the prediction service to disk for model serving
  17. saved_path = iris_classifier_service.save()

The sample code above can be found in the BentoML repository, run them directly with the following command:

  1. git clone git@github.com:bentoml/BentoML.git
  2. python ./bentoml/guides/quick-start/main.py

After saving the BentoService instance, you can now start a REST API server with the model trained and test the API server locally:

  1. # Start BentoML API server:
  2. bentoml serve IrisClassifier:latest
  1. # Send test request
  2. curl -i \
  3. --header "Content-Type: application/json" \
  4. --request POST \
  5. --data '[[5.1, 3.5, 1.4, 0.2]]' \
  6. localhost:5000/predict

BentoML provides a convenient way of containerizing the model API server with Docker. To create a docker container image for the sample model above:

  1. Find the file directory of the SavedBundle with bentoml get command, which is directory structured as a docker build context.
  2. Running docker build with this directory produces a docker image containing the model API server
  1. saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")
  2. # Replace `{docker_username} with your Docker Hub username
  3. docker build -t {docker_username}/iris-classifier $saved_path
  4. docker push {docker_username}/iris-classifier

Deploy model server to Kubernetes

The following is an example YAML file for specifying the resources required to run and expose a BentoML model server in a Kubernetes cluster. Replace {docker_username} with your Docker Hub username and save it to iris-classifier.yaml:

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. app: iris-classifier
  6. name: iris-classifier
  7. namespace: kubeflow
  8. spec:
  9. ports:
  10. - name: predict
  11. port: 5000
  12. targetPort: 5000
  13. selector:
  14. app: iris-classifier
  15. type: LoadBalancer
  16. ---
  17. apiVersion: apps/v1
  18. kind: Deployment
  19. metadata:
  20. labels:
  21. app: iris-classifier
  22. name: iris-classifier
  23. namespace: kubeflow
  24. spec:
  25. selector:
  26. matchLabels:
  27. app: iris-classifier
  28. template:
  29. metadata:
  30. labels:
  31. app: iris-classifier
  32. spec:
  33. containers:
  34. - image: {docker_username}/iris-classifier
  35. imagePullPolicy: IfNotPresent
  36. name: iris-classifier
  37. ports:
  38. - containerPort: 5000

Use kubectl CLI to deploy the model API server to the Kubernetes cluster

  1. kubectl apply -f iris-classifier.yaml

Send prediction request

Use kubectl describe command to get the NODE_PORT

  1. kubectl describe svc iris-classifier --namespace kubeflow

And then send the request:

  1. curl -i \
  2. --header "Content-Type: application/json" \
  3. --request POST \
  4. --data '[[5.1, 3.5, 1.4, 0.2]]' \
  5. http://EXTERNAL_IP:NODE_PORT/predict

Monitor metrics with Prometheus

Prerequisites

Before starting this section, make sure you have the following:

BentoML API server provides Prometheus support out of the box. It comes with a “/metrics” endpoint which includes the essential metrics for model serving and the ability to create and customize new metrics base on needs.

To enable Prometheus monitoring on the deployed model API server, update the YAML file with Prometheus related annotations. Change the deployment spec as the following, and replace {docker_username} with your Docker Hub username:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. labels:
  5. app: iris-classifier
  6. name: iris-classifier
  7. namespace: kubeflow
  8. spec:
  9. selector:
  10. matchLabels:
  11. app: iris-classifier
  12. template:
  13. metadata:
  14. labels:
  15. app: iris-classifier
  16. annotations:
  17. prometheus.io/scrape: true
  18. prometheus.io/port: 5000
  19. spec:
  20. containers:
  21. - image: {docker_username}/iris-classifier
  22. imagePullPolicy: IfNotPresent
  23. name: iris-classifier
  24. ports:
  25. - containerPort: 5000

Apply the change with kubectl CLI.

  1. kubectl apply -f iris-classifier.yaml

Remove deployment

  1. kubectl delete -f iris-classifier.yaml

Additional resources

Last modified 21.04.2021: Components v external add ons (#2630) (42f08be0)