TorchServe grpc server side streaming #2180

lxning · 2023-03-16T16:54:53Z

🚀 The feature

TorchServe grpc server side streaming supports

backend worker continuously send the intermediate prediction response to frontend
frontend grpc endpoint continuously send the intermediate prediction response from backend to client.

Motivation, pitch

Usually the predication latency is high (eg. 5sec) for a large model inference. Some models are able to generate intermediate prediction results (eg. generator AI). This feature will send the intermediate prediction results to user once the results are ready. The user will gradually get the entire response. For example, User may get first intermediate response within 1 sec, and gradually get the entire result until 5sec. This feature is to improve user prediction experience.

Alternatives

No response

Additional context

No response

lxning self-assigned this Mar 16, 2023

lxning added the enhancement New feature or request label Mar 16, 2023

lxning added this to the v0.8.0 milestone Mar 16, 2023

lxning mentioned this issue Mar 20, 2023

Feature/grpc streaming #2186

Merged

8 tasks

lxning closed this as completed Apr 5, 2023

lxning mentioned this issue Apr 16, 2023

http stream response via HTTP 1.1 chunked encoding #2232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TorchServe grpc server side streaming #2180

TorchServe grpc server side streaming #2180

lxning commented Mar 16, 2023

TorchServe grpc server side streaming #2180

TorchServe grpc server side streaming #2180

Comments

lxning commented Mar 16, 2023

🚀 The feature

Motivation, pitch

Alternatives

Additional context