Overview

Model serving overview

Kubeflow supports two model serving systems that allow multi-framework modelserving: KFServing and Seldon Core. Alternatively, you can use astandalone model serving system. This page gives an overview of the options, sothat you can choose the framework that best supports your model servingrequirements.

Multi-framework serving with KFServing or Seldon Core

KFServing and Seldon Core are both open source systems that allowmulti-framework model serving. The following table comparesKFServing and Seldon Core. A check mark () indicates that the system(KFServing or Seldon Core) supports the feature specified in that row.

FeatureSub-featureKFServingSeldon Core
FrameworkTensorFlowsampledocs
XGBoostsampledocs
scikit-learnsampledocs
NVIDIA TensorRT Inference Serversampledocs
ONNXsampledocs
PyTorchsample
GraphTransformerssampledocs
CombinersRoadmapsample
Routers including MABRoadmapdocs
AnalyticsExplanationssampledocs
ScalingKnativesample
GPU AutoScalingsample
HPAdocs
CustomContainersampledocs
Language WrappersPython, Java, R
Multi-Containerdocs
RolloutCanarysampledocs
Shadow
Istio

Notes:

  • KFServing and Seldon Core share some technical features, including explainability (using Seldon Alibi Explain) and payload logging, as well as other areas.
  • A commercial product, Seldon Deploy, supports both KFServing and Seldon in production.
  • KFServing is part of the Kubeflow project ecosystem. Seldon Core is an external project supported within Kubeflow.

Further information:

TensorFlow Serving

For TensorFlow models you can use TensorFlow Serving forreal-time prediction.However, if you plan to use multiple frameworks, you should consider KFServingor Seldon Core as described above.

NVIDIA TensorRT Inference Server

NVIDIA TensorRT Inference Server is a REST and GRPC service for deep-learninginferencing of TensorRT, TensorFlow and Caffe2 models. The server isoptimized to deploy machine learning algorithms on both GPUs andCPUs at scale.

You can use NVIDIA TensorRT Inference Server as astandalone system,but you should consider KFServing as described above. KFServing includes supportfor NVIDIA TensorRT Inference Server.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Last modified 27.01.2020: Deleted the PyTorch Serving page (#1555) (923ea2ac)