Add a guide for implementing server-side batching using the Python Predictor #1470

RobertLucian · 2020-10-21T14:14:49Z

Description

Can be inspired by https://docs.cortex.dev/deployments/realtime-api/parallelism#server-side-batching.

Template implementation:

import threading as td
import time

class PythonPredictor:
    def __init__(self, config):
        self.model = None # initialize the model here

        self.waiter = td.Event()
        self.waiter.set()

        self.batch_max_size = config["batch_max_size"]
        self.batch_interval = config["batch_interval"] # measured in seconds
        self.barrier = td.Barrier(self.batch_max_size + 1)

        self.samples = {}
        self.predictions = {}
        td.Thread(target=self._batch_engine).start()

    def _batch_engine(self):
         while True:
            if len(self.predictions) > 0:
                time.sleep(0.001)
                continue

            try:
                self.barrier.wait(self.batch_interval)
            except td.BrokenBarrierError:
                pass
            self.waiter.clear()
            self.predictions = {}

            self.batch_inference()

            self.samples = {}
            self.barrier.reset()
            self.waiter.set()

    def batch_inference(self):
        """
        Run the batch inference here.
        """
        # batch process self.samples
        # store results in self.predictions
        # make sure to write the results to self.predictions using the keys from self.samples

    def enqueue_sample(self, sample):
        """
        Enqueue sample for batch inference. This is a blocking method.
        """
        thread_id = td.get_ident()

        self.waiter.wait()
        self.samples[thread_id] = sample
        try:
            self.barrier.wait()
        except td.BrokenBarrierError:
            pass

    def get_prediction(self):
        """
        Return the prediction. This is a blocking method.
        """
        thread_id = td.get_ident()
        while thread_id not in self.predictions:
            time.sleep(0.001)
        prediction = self.predictions[thread_id]
        del self.predictions[thread_id]

        return prediction

    def predict(self, payload):
        self.enqueue_sample(payload)
        prediction = self.get_prediction()

        return prediction

Motivation

Useful for those users who really need server-side batching for the Python Predictor.
Has been requested by @manneshiva.

RobertLucian added the docs Improvements or additions to documentation label Oct 21, 2020

RobertLucian mentioned this issue Nov 4, 2020

Support server side batching for all Predictor types #1382

Closed

deliahu mentioned this issue Dec 9, 2020

Python predictor server-side batching #1653

Merged

6 tasks

miguelvr closed this as completed in #1653 Dec 17, 2020

deliahu added this to the v0.25 milestone Dec 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a guide for implementing server-side batching using the Python Predictor #1470

Add a guide for implementing server-side batching using the Python Predictor #1470

RobertLucian commented Oct 21, 2020 •

edited

Loading

Add a guide for implementing server-side batching using the Python Predictor #1470

Add a guide for implementing server-side batching using the Python Predictor #1470

Comments

RobertLucian commented Oct 21, 2020 • edited Loading

Description

Motivation

RobertLucian commented Oct 21, 2020 •

edited

Loading