KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks. It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX.
It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.
For more details, visit KServe website
Since 0.7 KFServing is rebranded to KServe, we still support previous KFServing 0.5.x and 0.6.x releases, please refer to corresponding release branch for docs.
To learn more about KServe, how to deploy it as part of Kubeflow, how to use various supported features, and how to participate in the KServe community, please follow the KServe website documentation. Additionally, we have compiled a list of presentations and demoes to dive through various details.
KServe by default installs Knative for serverless deployment, please follow Serverless installation guide to install KServe. If you are looking to install KServe without Knative(this feature is still alpha), please follow Raw Kubernetes Deployment installation guide.
Please follow quick install to install KServe on your local machine.
Please follow getting started to create your first InferenceService
.