Open
Description
This is the top level issue to track all the work we plan to do to make the glow runtime supports concurrent execution, pipelining, batching and so on.
At a high level, the idea for the runtime is to be able to:
- Enqueue inputs: Run input0, then run input1 as soon as the previous run is done, etc.
- Slice the inputs into batch size and transparently run them: Take N input and sequentially run them in batches of M (where M is the size of the compiled model and N the actual run size.)
- Pipeline work across models: Run input1 on model M1, then run the result of M1 on M2 while running input2 on M1, etc.
Among other things, the glow runtime will have to:
- Manage input/output queues for each model (and communication with the devices)
- Manage incoming model
- Keep track of data dependencies and schedule next tasks to be done
- Split inputs
- Pad inputs
- Dispatch workload on device
- Keep track of the status of devices
Also, somewhat orthogonal to the runtime, but related, glow will need to: - Determine what and where to run things (graph partitioning)
Right now, we started by splitting the compilation and runtime stages properly.
This work is tracked in:
#2040, #1967, #1953, #1951
Metadata
Metadata
Assignees
Labels
No labels