How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #139

songkq · 2023-06-01T02:55:57Z

In a production environment like ChatGPT, early termination of a conversation based on user-client commands can be a major requirement. I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? Could you please give some advice?

with grpcclient.InferenceServerClient(self.model_url) as client:
        client.start_stream(callback=partial(stream_callback, result_queue))
        client.async_stream_infer(self.model_name, request_data)

bigmover · 2023-07-19T01:59:39Z

async_stream_infer maybe need a package_input?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #139

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #139

songkq commented Jun 1, 2023

bigmover commented Jul 19, 2023

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #139

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #139

Comments

songkq commented Jun 1, 2023

bigmover commented Jul 19, 2023