Skip to content

Commit

Permalink
Torchserve v2 protocol (kubeflow#1870)
Browse files Browse the repository at this point in the history
* feat: v2 protocol support for torchserve

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Feat: Add unit test for torchserve predictor

 - Add e2e test

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Feat: Add readme doc for v2 support

 - fix lint error
 - add grpc sample yaml
 - add tensor input generation script
 - fix model archiver to support v2 protocol

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Feat: Add gRPC client

 - fix lint error
 - update readme for gRPC

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

feat: Add custom handlers for v2 api

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Update torchserve image in test overlay

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Feat: Add readme doc for v2 support

 - fix lint error
 - add grpc sample yaml
 - add tensor input generation script
 - fix model archiver to support v2 protocol

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Update test_transformer.py

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Add e2e test for grpc torchserve

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Update test configuration

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Support torchserve runtime for v2 protocol

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Fix storage uri for v2 example

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

Add grpc debug and retry config

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

fix: torchserve gRPC test

Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com>

* fix: skip gRPC test

Signed-off-by: Jagadeesh J <jagadeeshj@ideas2it.com>
  • Loading branch information
Jagadeesh J authored Dec 14, 2021
1 parent f25ca1a commit 07e4d5d
Show file tree
Hide file tree
Showing 64 changed files with 1,764 additions and 3,806 deletions.
25 changes: 7 additions & 18 deletions config/configmap/inferenceservice.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,24 +61,13 @@ data:
}
},
"pytorch": {
"v1" : {
"image": "kserve/pytorchserver",
"defaultImageVersion": "latest",
"defaultGpuImageVersion": "latest-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
},
"v2" : {
"image": "pytorch/torchserve-kfs",
"defaultImageVersion": "0.4.1",
"defaultGpuImageVersion": "0.4.1-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
}
"image": "kserve/torchserve-kfs",
"defaultImageVersion": "0.5.0",
"defaultGpuImageVersion": "0.5.0-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
},
"triton": {
"image": "nvcr.io/nvidia/tritonserver",
Expand Down
2 changes: 0 additions & 2 deletions config/crd/serving.kserve.io_inferenceservices.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7301,8 +7301,6 @@ spec:
format: int32
type: integer
type: object
modelClassName:
type: string
name:
type: string
ports:
Expand Down
25 changes: 7 additions & 18 deletions config/overlays/test/configmap/inferenceservice.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,24 +61,13 @@ data:
}
},
"pytorch": {
"v1" : {
"image": "809251082950.dkr.ecr.us-west-2.amazonaws.com/kserve/pytorchserver",
"defaultImageVersion": "latest",
"defaultGpuImageVersion": "latest-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
},
"v2" : {
"image": "pytorch/torchserve-kfs",
"defaultImageVersion": "0.4.1",
"defaultGpuImageVersion": "0.4.1-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
}
"image": "kserve/torchserve-kfs",
"defaultImageVersion": "0.5.0",
"defaultGpuImageVersion": "0.5.0-gpu",
"supportedFrameworks": [
"pytorch"
],
"multiModelServer": false
},
"paddle": {
"image": "ruminateer/paddleserver",
Expand Down
24 changes: 0 additions & 24 deletions config/runtimes/kserve-pytorchserver.yaml

This file was deleted.

9 changes: 2 additions & 7 deletions config/runtimes/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ resources:
- kserve-pmmlserver.yaml
- kserve-paddleserver.yaml
- kserve-lgbserver.yaml
- kserve-pytorchserver.yaml
- kserve-torchserve.yaml

images:
Expand Down Expand Up @@ -44,10 +43,6 @@ images:
newName: kserve/lgbserver
newTag: latest

- name: kserve-pytorchserver
newName: kserve/pytorchserver
newTag: latest

- name: kserve-torchserve
newName: pytorch/torchserve-kfs
newTag: 0.4.1
newName: kserve/torchserve-kfs
newTag: 0.5.0
2 changes: 1 addition & 1 deletion docs/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ After models are deployed onto model servers with KServe, you get all the follow
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| [Triton Inference Server](https://github.com/triton-inference-server/server) | [TensorFlow,TorchScript,ONNX,TensorRT](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/model_repository.html)| v2 | :heavy_check_mark: | :heavy_check_mark: | [Compatibility Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)| [Triton Examples](./v1beta1/triton) |
| [TFServing](https://www.tensorflow.org/tfx/guide/serving) | [TensorFlow SavedModel](https://www.tensorflow.org/guide/saved_model) | v1 | :heavy_check_mark: | :heavy_check_mark: | [TFServing Versions](https://github.com/tensorflow/serving/releases) | [TensorFlow Examples](./v1beta1/tensorflow) |
| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1 | :heavy_check_mark: | :heavy_check_mark: | 0.4.1 | [TorchServe Examples](./v1beta1/torchserve) |
| [TorchServe](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | v1/v2 | :heavy_check_mark: | :heavy_check_mark: | 0.4.1 | [TorchServe Examples](./v1beta1/torchserve) |
| [TorchServe Native](https://pytorch.org/serve/server.html) | [Eager Model/TorchScript](https://pytorch.org/docs/master/generated/torch.save.html) | native | :heavy_check_mark: | :heavy_check_mark: | 0.4.1 | [TorchServe Examples](./v1beta1/custom/torchserve) |
| [ONNXRuntime](https://github.com/microsoft/onnxruntime) | [Exported ONNX Model](https://github.com/onnx/tutorials#converting-to-onnx-format) | v1 | :heavy_check_mark: | :heavy_check_mark: | [Compatibility](https://github.com/microsoft/onnxruntime#compatibility) |[ONNX Style Model](./v1beta1/onnx) |
| [SKLearn MLServer](https://github.com/SeldonIO/MLServer) | [Pickled Model](https://scikit-learn.org/stable/modules/model_persistence.html) | v2 | :heavy_check_mark: | :heavy_check_mark: | 0.23.1 | [SKLearn Iris V2](./v1beta1/sklearn/v2) |
Expand Down
129 changes: 19 additions & 110 deletions docs/samples/v1beta1/torchserve/README.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ CONFIG_PATH=$BASE_PATH/config
touch $CONFIG_PATH/config.properties

cat <<EOF > "$CONFIG_PATH"/config.properties
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8085
number_of_netty_threads=4
enable_envvars_config=true
install_py_dep_per_model=true
job_queue_size=100
model_store="$MODEL_STORE"
model_snapshot=
Expand Down
153 changes: 153 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/README.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# TorchServe example with Huggingface bert model
In this example we will show how to serve [Huggingface Transformers with TorchServe](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers)
on KFServing.
on KServe.

## Model archive file creation

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8081
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
service_envelope=kfserving
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"bert":{"1.0":{"defaultVersion":true,"marName":"BERTSeqClassification.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
inference_address=http://0.0.0.0:8085
management_address=http://0.0.0.0:8081
management_address=http://0.0.0.0:8085
metrics_address=http://0.0.0.0:8082
grpc_inference_port=7070
grpc_management_port=7071
enable_metrics_api=true
metrics_format=prometheus
number_of_netty_threads=4
job_queue_size=10
service_envelope=kfserving
enable_envvars_config=true
install_py_dep_per_model=true
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "torchserve"
name: "torchserve-gpu"
spec:
predictor:
pytorch:
Expand Down
12 changes: 12 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/grpc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: "torchserve-grpc"
spec:
predictor:
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier
ports:
- containerPort: 7070
name: h2c
protocol: TCP
File renamed without changes.
104 changes: 104 additions & 0 deletions docs/samples/v1beta1/torchserve/v1/torchserve_grpc_client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import grpc
import inference_pb2
import inference_pb2_grpc
import management_pb2
import management_pb2_grpc
import sys


def get_inference_stub():
channel = grpc.insecure_channel(
'localhost:8080',
options=(('grpc.ssl_target_name_override',
'torchserve-grpc.kserve-test.example.com'),))
stub = inference_pb2_grpc.InferenceAPIsServiceStub(channel)
return stub


def get_management_stub():
channel = grpc.insecure_channel(
'localhost:8081',
options=(('grpc.ssl_target_name_override',
'torchserve-grpc.kserve-test.example.com'),))
stub = management_pb2_grpc.ManagementAPIsServiceStub(channel)
return stub


def infer(stub, model_name, model_input):
with open(model_input, 'rb') as f:
data = f.read()

input_data = {'data': data}
response = stub.Predictions(
inference_pb2.PredictionsRequest(model_name=model_name,
input=input_data))

try:
prediction = response.prediction.decode('utf-8')
print(prediction)
except grpc.RpcError:
exit(1)


def ping(stub):
response = stub.Ping(inference_pb2.TorchServeHealthResponse())
try:
health = response
print("Ping Response:", health)
except grpc.RpcError:
exit(1)


def register(stub, model_name, mar_set_str):
mar_set = set()
if mar_set_str:
mar_set = set(mar_set_str.split(','))
marfile = f"{model_name}.mar"
print(f"## Check {marfile} in mar_set :", mar_set)
if marfile not in mar_set:
marfile = "https://torchserve.s3.amazonaws.com/mar_files/{}.mar".format(
model_name)

print(f"## Register marfile:{marfile}\n")
params = {
'url': marfile,
'initial_workers': 1,
'synchronous': True,
'model_name': model_name
}
try:
stub.RegisterModel(management_pb2.RegisterModelRequest(**params))
print(f"Model {model_name} registered successfully")
except grpc.RpcError as e:
print(f"Failed to register model {model_name}.")
print(str(e.details()))
exit(1)


def unregister(stub, model_name):
try:
stub.UnregisterModel(
management_pb2.UnregisterModelRequest(model_name=model_name))
print(f"Model {model_name} unregistered successfully")
except grpc.RpcError as e:
print(f"Failed to unregister model {model_name}.")
print(str(e.details()))
exit(1)


if __name__ == '__main__':
# args:
# 1-> api name [infer, register, unregister]
# 2-> model name
# 3-> model input for prediction
args = sys.argv[1:]
if args[0] == "infer":
infer(get_inference_stub(), args[1], args[2])
elif args[0] == "ping":
ping(get_inference_stub())
else:
api = globals()[args[0]]
if args[0] == "register":
api(get_management_stub(), args[1], args[2])
else:
api(get_management_stub(), args[1])
Loading

0 comments on commit 07e4d5d

Please sign in to comment.