You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently exploring ways of enabling batch functionality within Seldon Core. First section defines the terminology, and then we dive into options to implement it.
1. Batch types
We have been able to identify two different types of "Batch jobs" which have been grouped based on functionality:
1.1. Non-long running batch jobs
1.2. Long running batch jobs
Long running is defined as jobs that would take more than a few dozen seconds to provide a resonse (and hence the HTTP/GRPC request/response architecture would not be suitable).
The scope of #1413 will be of "Non-long running" batch jobs, as from our current research, being able to extend Seldon Core functionality for this type of batch jobs seems feasible without major modifications.
The latter piece will be outside of the scope of #1413, but we would still be interested to explore in the medium or long term. However at the bottom we do provide some insight about our current thoughts.
2. Requirements
There are three key requirements that were identified for batch processing:
2.1. Jobs that are asynchronous, defined as being able to pull resources from a data source and push resources to another data source when it's done
2.2. Jobs that only run to process the data and terminate when the finite dataset has been processed
2.3. Jobs that encompass [2.2] and can be triggered on a schedule
The way that we tackle them is by providing a solution for just [2.1] in isolation (as it's relevant for continuously running async jobs), and then [2.1], [2.2] and [2.3] together, as it's relevant for general batch jobs.
3. Proposed Implementations
3.1. Asynchronous Jobs
The design consists of one extra component which is in charge of:
Ingesting data from data source or data stream (continuously polling)
Processing request(s) by sending to internal engine and waiting for response
Sending response back to data source or data stream
Notifying external system of data point completion (success, failure, etc)
This implementation would also be able to scale as it consumes from the queue:
This implementation could be set up as a container within the SeldonDeploy yaml. We can leverage the componentSpecs to add an extra container which would not be part of the graph from an image referenced as "seldon-data-ingestor" which would be in charge of the actions above:
This section aims to achieve [2.1] + [2.2] + [2.3].
Note: Both options below ([3.2.1] and [3.2.2]) have a strong assumption of Kube Batch being able to address our requirement, as well as for leveraging the schedule functionality, but this is something that will require further investigation to confirm feasibility (https://github.com/kubernetes-sigs/kube-batch)
3.2.1 - Option 1
Consists of two fully external components; the DataIngestor component and the Batch Job component (which would start both the SeldonDeployment and DataIngestor and terminate).
This design consists of two components:
An extensible data ingestor container responsible for:
Ingesting data from custom data source
Coordinating sending request to Executor and waiting for response
Being able to hold long-running requests (60min+)
Uploading results / notifying termination
Notifying external system of batch completion (Success, Failure)
Kubernetes Batch Component
Component in charge of turning everything off when data ingestor finishes or fails
Component in charge of making logs available when job is terminated
Component in charge of providing status (running, success, failed, etc)
Disadvantages:
Less integrated with batch component (as may require own CRD)
Potentially harder to handle long-running containers due to dependency on load balancer
Advantages:
Ability to scale up data ingestor pods and SeldonDeployment pods based on HPA (as per diagram below)
No changes / modifications to Seldon CRD / operator logic
May require a new operator to handle creation of Seldon Deployment
This implementation would also be able to scale as more requests are sent by leveraging HPA:
3.2.2 - Option 2
This design consists of two components:
An extensible data ingestor container responsible for:
Ingesting data from custom data source
Coordinating sending request to Executor and waiting for response
Being able to hold long-running requests (60min+)
Uploading results / notifying termination
Notifying external system of batch completion (Success, Failure)
Kubernetes Batch Component
Component in charge of turning everything off when data ingestor finishes or fails
Component in charge of making logs available when job is terminated
Component in charge of providing status (running, success, failed, etc)
Disadvantages:
This approach wouldn't be able to scale using HPA if loading data from a database (instead of a stream as per [3.1]) given that unlike [3.2.1] the data ingestor is inside of the SeldonDeploy definition, so there would have to be some very complex logic to split the data across the multple jobs.
4. Further exploration
The following section is only high level exploration on how the "Long-running" batch job type could be achieved / explored. The suggestion is that it could be possible to leverage an external framework like ariflow, through a Seldon Engine proxy that would wrap the API as follows:
This is something that we'll be exploring once we have a better understanding of the above (as this wouldn't encompass the async, nor job termination requirements outlined)
The text was updated successfully, but these errors were encountered:
They are medium-length, but should give a good set of terminology and baseline patterns as of 2016 for processing non-interactive high-throughput data.
Batch Processing Exploration for Seldon Core
We are currently exploring ways of enabling batch functionality within Seldon Core. First section defines the terminology, and then we dive into options to implement it.
1. Batch types
We have been able to identify two different types of "Batch jobs" which have been grouped based on functionality:
1.1. Non-long running batch jobs
1.2. Long running batch jobs
Long running is defined as jobs that would take more than a few dozen seconds to provide a resonse (and hence the HTTP/GRPC request/response architecture would not be suitable).
The scope of #1413 will be of "Non-long running" batch jobs, as from our current research, being able to extend Seldon Core functionality for this type of batch jobs seems feasible without major modifications.
The latter piece will be outside of the scope of #1413, but we would still be interested to explore in the medium or long term. However at the bottom we do provide some insight about our current thoughts.
2. Requirements
There are three key requirements that were identified for batch processing:
2.1. Jobs that are asynchronous, defined as being able to pull resources from a data source and push resources to another data source when it's done
2.2. Jobs that only run to process the data and terminate when the finite dataset has been processed
2.3. Jobs that encompass [2.2] and can be triggered on a schedule
The way that we tackle them is by providing a solution for just [2.1] in isolation (as it's relevant for continuously running async jobs), and then [2.1], [2.2] and [2.3] together, as it's relevant for general batch jobs.
3. Proposed Implementations
3.1. Asynchronous Jobs
The design consists of one extra component which is in charge of:
This implementation would also be able to scale as it consumes from the queue:
This implementation could be set up as a container within the SeldonDeploy yaml. We can leverage the componentSpecs to add an extra container which would not be part of the graph from an image referenced as "seldon-data-ingestor" which would be in charge of the actions above:
3.2 Asynchronous Job + Termination + Scheduling
This section aims to achieve [2.1] + [2.2] + [2.3].
Note: Both options below ([3.2.1] and [3.2.2]) have a strong assumption of Kube Batch being able to address our requirement, as well as for leveraging the schedule functionality, but this is something that will require further investigation to confirm feasibility (https://github.com/kubernetes-sigs/kube-batch)
3.2.1 - Option 1
Consists of two fully external components; the DataIngestor component and the Batch Job component (which would start both the SeldonDeployment and DataIngestor and terminate).
This design consists of two components:
An extensible data ingestor container responsible for:
Kubernetes Batch Component
Disadvantages:
Advantages:
This implementation would also be able to scale as more requests are sent by leveraging HPA:
3.2.2 - Option 2
This design consists of two components:
Disadvantages:
4. Further exploration
The following section is only high level exploration on how the "Long-running" batch job type could be achieved / explored. The suggestion is that it could be possible to leverage an external framework like ariflow, through a Seldon Engine proxy that would wrap the API as follows:
This is something that we'll be exploring once we have a better understanding of the above (as this wouldn't encompass the async, nor job termination requirements outlined)
The text was updated successfully, but these errors were encountered: