forked from kubeflow/pipelines
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support for gcs access without credentials and multi-model serving e2…
…e test with sklearn/xgboost examples + docs (kubeflow#1306) * Imported trained model spec and model spec * Update description: trained model -> TrainedModel * Considers model name in predict query * Multi model serving test for sklearn and xgboost * Constants for v1alpha versions * Provider client is generated when downloading * New method to create/deploy trained model object * Example on running multi model serving * Snake case variable -> Camel case * Using new version constants * CreateProviderIfNotExists will return provider and error * Updated to use GetProvider * Corrected the file path * Removed object file path when creating fileName * Added overview of multi-model serving * Overview of inferenceservice, trainedmodel, and model agent * Removed check for version in create_trained_model * Multi-model serving example for sklearn * Moved provider creation to package agent storage * Fixed up confusing wording * Fixed up typo * Included detailed diagram * Added general overview
- Loading branch information
Showing
19 changed files
with
607 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Multi-Model Serving | ||
## Introduction | ||
|
||
### Problem | ||
|
||
With machine learning approaches becoming more widely adopted in organizations, there is a trend to deploy many models. More models aims to provide personalized experience which often need to train a lot of models. Additionally, many models help to isolate each user’s data and train models separately for data privacy. | ||
When KFServing was originally designed, it followed the one model and one server paradigm which presents a challenge for the Kubernetes cluster when users want to deploy many models. | ||
For example, Kubernetes sets a default limit of 110 pods per node. A 100 nodes cluster can host at most 11,000 pods, which is often not enough. | ||
Additionally, there is no easy way to request a fraction of GPU in Kubernetes infrastructure, it makes sense to load multiple models in one model server to share GPU resources. KFServing's multi-model serving is a solution that allows for loading multiple models into a server while still keeping the out of the box serverless features. | ||
|
||
### Benefits | ||
- Allow multiple models to share the same GPU | ||
- Increase the total number of models that can be deployed in a cluster | ||
- Reduced model deployment resource overhead | ||
- An InferenceService needs some CPU and overhead for each replica | ||
- Loading multiple models in one inferenceService is more resource efficient | ||
- Allow deploying hundreds of thousands of models with ease and monitoring deployed trained models at scale | ||
|
||
### Design | ||
![Multi-model Diagram](./diagrams/mms-design.png) | ||
|
||
### Integration with model servers | ||
Multi-model serving will work with any model server that implements KFServing V2 protocol. More specifically, if the model server implements the load and unload endpoint then it can use KFServing's TrainedModel. | ||
Currently, the only supported model servers are Triton, SKLearn, and XGBoost. Click on [Triton](https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/triton/multimodel) or [SKLearn](https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/sklearn/multimodel) to see examples on how to run multi-model serving! | ||
|
||
|
||
|
||
For a more in depth details checkout this [document](https://docs.google.com/document/d/11qETyR--oOIquQke-DCaLsZY75vT1hRu21PesSUDy7o). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
#Multi-Model Serving with Sklearn | ||
|
||
## Overview | ||
|
||
The general overview of multi-model serving: | ||
1. Deploy InferenceService with the framework specified | ||
2. Deploy TrainedModel(s) with the storageUri, framework, and memory | ||
3. A config map will be created and will contain details about each trained model | ||
4. Model Agent loads model from the model config | ||
5. An endpoint is set up and is ready to serve model(s) | ||
6. Deleting a model leads to removing model from config map which causes the model agent to unload the model | ||
7. Deleting the InferenceService causes the TrainedModel(s) to be deleted | ||
|
||
|
||
## Example | ||
Firstly, you should have kfserving installed. Check [this](https://github.com/kubeflow/kfserving#install-kfserving) out if you have not installed kfserving. | ||
|
||
The content below is in the file `inferenceservice.yaml`. | ||
|
||
```yaml | ||
apiVersion: "serving.kubeflow.org/v1beta1" | ||
kind: "InferenceService" | ||
metadata: | ||
name: "sklearn-iris-example" | ||
spec: | ||
predictor: | ||
minReplicas: 1 | ||
sklearn: | ||
protocolVersion: v1 | ||
name: "sklearn-iris-predictor" | ||
resources: | ||
limits: | ||
cpu: 100m | ||
memory: 256Mi | ||
requests: | ||
cpu: 100m | ||
memory: 256Mi | ||
``` | ||
Run the command `kubectl apply -f inferenceservice.yaml` to create the inference service. Check if the service is properly deployed by running `kubectl get inferenceservice`. The output should be similar to the below. | ||
```yaml | ||
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE | ||
sklearn-iris-example http://sklearn-iris-example.default.example.com True 100 sklearn-iris-example-predictor-default-kgtql 22s | ||
``` | ||
|
||
Next, the other file the trained models `trainedmodels.yaml` is shown below. | ||
```yaml | ||
apiVersion: "serving.kubeflow.org/v1alpha1" | ||
kind: "TrainedModel" | ||
metadata: | ||
name: "model1-sklearn" | ||
spec: | ||
inferenceService: "sklearn-iris-example" | ||
model: | ||
storageUri: "gs://kfserving-samples/models/sklearn/iris" | ||
framework: "sklearn" | ||
memory: "256Mi" | ||
--- | ||
apiVersion: "serving.kubeflow.org/v1alpha1" | ||
kind: "TrainedModel" | ||
metadata: | ||
name: "model2-sklearn" | ||
spec: | ||
inferenceService: "sklearn-iris-example" | ||
model: | ||
storageUri: "gs://kfserving-samples/models/sklearn/iris" | ||
framework: "sklearn" | ||
memory: "256Mi" | ||
``` | ||
Run the command `kubectl apply -f trainedmodels.yaml` to create the trained models. Run `kubectl get trainedmodel` to view the resource. | ||
|
||
Run `kubectl get po` to get the name of the predictor pod. The name should be similar to sklearn-iris-example-predictor-default-xxxxx-deployment-xxxxx. | ||
|
||
Run `kubectl logs <name-of-predictor-pod> -c agent` to check if the models are properly loaded. You should get the same output as below. Wait a few minutes and try again if you do not see "Downloading model". | ||
```yaml | ||
{"level":"info","ts":"2021-01-20T16:24:00.421Z","caller":"agent/puller.go:129","msg":"Downloading model from gs://kfserving-samples/models/sklearn/iris"} | ||
{"level":"info","ts":"2021-01-20T16:24:00.421Z","caller":"agent/downloader.go:47","msg":"Downloading gs://kfserving-samples/models/sklearn/iris to model dir /mnt/models"} | ||
{"level":"info","ts":"2021-01-20T16:24:00.424Z","caller":"agent/puller.go:121","msg":"Worker is started for model1-sklearn"} | ||
{"level":"info","ts":"2021-01-20T16:24:00.424Z","caller":"agent/puller.go:129","msg":"Downloading model from gs://kfserving-samples/models/sklearn/iris"} | ||
{"level":"info","ts":"2021-01-20T16:24:00.424Z","caller":"agent/downloader.go:47","msg":"Downloading gs://kfserving-samples/models/sklearn/iris to model dir /mnt/models"} | ||
{"level":"info","ts":"2021-01-20T16:24:09.255Z","caller":"agent/puller.go:146","msg":"Successfully loaded model model2-sklearn"} | ||
{"level":"info","ts":"2021-01-20T16:24:09.256Z","caller":"agent/puller.go:114","msg":"completion event for model model2-sklearn, in flight ops 0"} | ||
{"level":"info","ts":"2021-01-20T16:24:09.260Z","caller":"agent/puller.go:146","msg":"Successfully loaded model model1-sklearn"} | ||
{"level":"info","ts":"2021-01-20T16:24:09.260Z","caller":"agent/puller.go:114","msg":"completion event for model model1-sklearn, in flight ops 0"} | ||
``` | ||
|
||
Run the command `kubectl get cm modelconfig-sklearn-iris-example-0 -oyaml` to get the configmap. The output should be similar to the below. | ||
```yaml | ||
apiVersion: v1 | ||
data: | ||
models.json: '[{"modelName":"model1-sklearn","modelSpec":{"storageUri":"gs://kfserving-samples/models/sklearn/iris","framework":"sklearn","memory":"256Mi"}},{"modelName":"model2-sklearn","modelSpec":{"storageUri":"gs://kfserving-samples/models/sklearn/iris","framework":"sklearn","memory":"256Mi"}}]' | ||
kind: ConfigMap | ||
metadata: | ||
creationTimestamp: "2021-01-20T16:22:52Z" | ||
name: modelconfig-sklearn-iris-example-0 | ||
namespace: default | ||
ownerReferences: | ||
- apiVersion: serving.kubeflow.org/v1beta1 | ||
blockOwnerDeletion: true | ||
controller: true | ||
kind: InferenceService | ||
name: sklearn-iris-example | ||
uid: f91d8414-0bfa-4182-af25-5d0c1a7eff4e | ||
resourceVersion: "1958556" | ||
selfLink: /api/v1/namespaces/default/configmaps/modelconfig-sklearn-iris-example-0 | ||
uid: 79e68f80-e31a-419b-994b-14a6159d8cc2 | ||
``` | ||
|
||
The models will be ready to serve once they are successfully loaded. | ||
|
||
Check to see which case applies to you. | ||
|
||
If the EXTERNAL-IP value is set, your environment has an external load balancer that you can use for the ingress gateway. Set them by running: | ||
````bash | ||
export INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}') | ||
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].port}') | ||
export SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris-example -n default -o jsonpath='{.status.url}' | cut -d "/" -f 3) | ||
```` | ||
|
||
If the EXTERNAL-IP is none, and you can access the gateway using the service's node port: | ||
```bash | ||
# GKE | ||
export INGRESS_HOST=worker-node-address | ||
# Minikube | ||
export INGRESS_HOST=$(minikube ip)å | ||
# Other environment(On Prem) | ||
export INGRESS_HOST=$(kubectl get po -l istio=ingressgateway -n istio-system -o jsonpath='{.items[0].status.hostIP}') | ||
export INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}') | ||
``` | ||
|
||
For KIND/Port Fowarding: | ||
- Run `kubectl port-forward -n istio-system svc/istio-ingressgateway 8080:80` | ||
- In a different window, run: | ||
```bash | ||
export INGRESS_HOST=localhost | ||
export INGRESS_PORT=8080 | ||
export SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris-example -n default -o jsonpath='{.status.url}' | cut -d "/" -f 3) | ||
``` | ||
|
||
|
||
After setting up the above: | ||
- Go to the root directory of `kfserving` | ||
- Query the two models: | ||
- Curl from ingress gateway: | ||
```bash | ||
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/model1-sklearn:predict -d @./docs/samples/v1alpha2/sklearn/iris-input.json | ||
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/model2-sklearn:predict -d @./docs/samples/v1alpha2/sklearn/iris-input.json | ||
``` | ||
- Curl from local cluster gateway | ||
``` | ||
curl -v http://sklearn-iris-example.default/v1/models/model1-sklearn:predict -d @./docs/samples/v1alpha2/sklearn/iris-input.json | ||
curl -v http://sklearn-iris-example.default/v1/models/model2-sklearn:predict -d @./docs/samples/v1alpha2/sklearn/iris-input.json | ||
``` | ||
|
||
The outputs should be | ||
```yaml | ||
{"predictions": [1, 1]}* | ||
``` | ||
|
||
To remove the resources, run the command `kubectl delete inferenceservice sklearn-iris-example`. This will delete the inference service and result in the trained models being deleted. |
17 changes: 17 additions & 0 deletions
17
docs/samples/v1beta1/sklearn/multimodel/inferenceservice.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
apiVersion: "serving.kubeflow.org/v1beta1" | ||
kind: "InferenceService" | ||
metadata: | ||
name: "sklearn-iris-example" | ||
spec: | ||
predictor: | ||
minReplicas: 1 | ||
sklearn: | ||
protocolVersion: v1 | ||
name: "sklearn-iris-predictor" | ||
resources: | ||
limits: | ||
cpu: 100m | ||
memory: 256Mi | ||
requests: | ||
cpu: 100m | ||
memory: 256Mi |
21 changes: 21 additions & 0 deletions
21
docs/samples/v1beta1/sklearn/multimodel/trainedmodels.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
apiVersion: "serving.kubeflow.org/v1alpha1" | ||
kind: "TrainedModel" | ||
metadata: | ||
name: "model1-sklearn" | ||
spec: | ||
inferenceService: "sklearn-iris-example" | ||
model: | ||
storageUri: "gs://kfserving-samples/models/sklearn/iris" | ||
framework: "sklearn" | ||
memory: "256Mi" | ||
--- | ||
apiVersion: "serving.kubeflow.org/v1alpha1" | ||
kind: "TrainedModel" | ||
metadata: | ||
name: "model2-sklearn" | ||
spec: | ||
inferenceService: "sklearn-iris-example" | ||
model: | ||
storageUri: "gs://kfserving-samples/models/sklearn/iris" | ||
framework: "sklearn" | ||
memory: "256Mi" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.