Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for server-side batch processing on Tensorflow/ONNX Predictors #1060

Closed
RobertLucian opened this issue May 15, 2020 · 0 comments · Fixed by #1193
Closed

Add support for server-side batch processing on Tensorflow/ONNX Predictors #1060

RobertLucian opened this issue May 15, 2020 · 0 comments · Fixed by #1193
Assignees
Labels
enhancement New feature or request example Create or improve an example
Milestone

Comments

@RobertLucian
Copy link
Member

RobertLucian commented May 15, 2020

Description

At the moment, the Tensorflow/ONNX Predictor APIs will compute a prediction immediately as a request is registered. Instead, let the API accumulate batch_size requests and then run the inference on the computing hardware. If a pool of batch_size requests can't be fulfilled in a given batch_timeout timeframe, then just run the inferences on what it's got at the moment.

Add a batch_size field in the configuration file to set a different bach size. By default, the field's value should be set to 1.
Also, add batch_timeout field to the configuration file to tune the API latency and throughput when batch_size > 1.

Motivation

This increases throughput substantially. This is dependent on the underlying used hardware, the used model, and the rate of incoming requests the API is experiencing.

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request example Create or improve an example
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants