Add support for server-side batch processing on Tensorflow/ONNX Predictors

#### Description

At the moment, the Tensorflow/ONNX Predictor APIs will compute a prediction immediately as a request is registered. Instead, let the API accumulate `batch_size` requests and then run the inference on the computing hardware. If a pool of `batch_size` requests can't be fulfilled in a given `batch_timeout` timeframe, then just run the inferences on what it's got at the moment. 

Add a `batch_size` field in the configuration file to set a different bach size. By default, the field's value should be set to 1.
Also, add `batch_timeout` field to the configuration file to tune the API latency and throughput when `batch_size` > 1.

#### Motivation

This increases throughput substantially. This is dependent on the underlying used hardware, the used model, and the rate of incoming requests the API is experiencing.

#### Additional context

* https://www.tensorflow.org/tfx/serving/performance#batch_size
* https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for server-side batch processing on Tensorflow/ONNX Predictors #1060

Description

Motivation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for server-side batch processing on Tensorflow/ONNX Predictors #1060

Description

Description

Motivation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions