Objective: "Prevent performance regressions across a known set of representative models."
- Automated Performance Tests
- Define a set of Models to test covering a wide array of usecases and frameworks.
- Publish performance results over time to enable regression tracking.
Objective: "Enable users to deploy latency sensitive models with KFServing."
- High Performance Dataplane
- Enable support for GRPC or similar.
- Continue to support existing HTTP Dataplane.
Objective: "Reduce Total Cost of Ownership when deploying multiple underutilized models."
- GPU Sharing
- Reduce TCO by enabling models of the same framework and version to be co-hosted in a single model server.
Objective: "Improve practices around dependency management."
- Migrate to Kubebuilder 2.0.
- Use Go Modules.
- Stop Vendoring dependencies.
- Avoid the extremely heavy dependency on Tensorflow.
- Migrate to Kubernetes 1.15.
- Enable LabelSelectors for the Pod Mutation Webhook.
Objective: "Prevent feature regressions with greater end-to-end test coverage against a live cluster."
- Automated End-to-End Tests
- Execute against a Kubeflow maintained GKE Cluster.
- Execute against a Kubeflow maintained AKS Cluster.
- Achieve >80% Test Coverage of Supported Features.
Objective: "Improve build and release processes to improve the developer experience and avoid regressions."
- Improve build reliability
- Implement build retries.
- Reduce PyTorch build time.
- Automated Image Injection for Model Servers.
- Implement new developer commands to deploy kfserving with local images.
- Improve versioning of XGBoost, SKLearn, and PyTorch
- Replace KFServing version with the corresponding framework version.
- Flexible Inference Graphs MLGraph CRD.
- Model Experimentation.
- Ensembling.
- Multi Arm Bandit.
- Payload Logging
- Finalize the design and implementation for Payload Logging
- Bias, Skew, and Outlier Detection.
- Online support in graph.
- Offline support with Payload Logging.
- Meta-Protocol Definition.
- Unify disparate protocols across frameworks.
- Adaptive batching support
- Queue and batch requests to increase throughput.
Objective: "Continue to simplify the user experience by deeply integrating with the Kubeflow Ecosystem."
- Kubeflow Integration
- Prepare KFServing to release v0.2 and v0.3 alongside Kubeflow v0.7.
- Integrate with
kfctl generate
andkfctl apply
. - Deploy as a Kubernetes Application.
- Integrate with Kubeflow Pipelines to enable model deployment from a Pipeline.
- Integrate with Fairing to enable model deployment from a Notebook.
- Achieve 20% End-to-End Test Coverage of Supported Features. (See v0.3 for 80%).
- Support PVCs to enable integration with on-prem Kubeflow installations.
- Document Installation for various cloud providers (GCP, IBM Cloud, Azure, AWS).
Objective: "Empower users to deeply understand their predictions and validate KFServing's static graph architecture."
- Explainability
- Deploy a predictor and explainer, powered by Alibi.
- Deploy a predictor and explainer, powered by user specified explainer container.
Objective: "Increase coverage of ML frameworks to support previously unsupported customer workloads."
- Frameworks
- Deploy a ONNX model
- Explore supporting other model serialization mechanisms for certain frameworks (e.g. saving PyTorch models with dill)
Objective: "Simplify the user experience and provide a low barrier to entry by minimizing the amount of YAML necessary to deploy a trained model."
- High Level Interfaces
- Deploy a Tensorflow model without specifying a Tensorflow Serving Technology.
- Deploy a XGBoost model without specifying a XGBoost Serving Technology.
- Deploy a ScikitLearn model without specifying a ScikitLearn Serving Technology.
- Deploy a Pytorch model without specifying a Pytorch Serving Technology.
- Deploy a Custom Containerized model by specifying your docker image and args.
Objective: "Empower users to safely deploy production models by enabling a variety of deployment strategies."
- Model Rollout
- Rollout a model using a blue-green strategy.
- Rollout a model using a pinned strategy.
- Rollout a model using a canary strategy.
Objective: "Reduce the total cost of ownership for models by minimizing the delta between provisioned resources and request load."
- Autoscaling
- Scale a model to zero.
- Scale a model from zero without dropping traffic.
- Scale a model that is GPU bound.
- Scale a model that is CPU bound.