-
Notifications
You must be signed in to change notification settings - Fork 622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support inferring batch size from tensor argument inputs #4617
Conversation
@@ -245,30 +245,6 @@ class DLTensorPythonFunctionImpl : public Operator<Backend> { | |||
bool synchronize_stream_; | |||
bool batch_processing; | |||
std::vector<TensorLayout> output_layouts_; | |||
|
|||
private: | |||
int GetCurrBatchSize(Workspace &ws) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the batch size can be done with the InferBatchSizeFromInput
. The extra checks were unnecessary: the EnforceUniformInput/OutputBatchSize
at the operator level does that. And as for the check of requested batch size (ws.GetRequestedBatchSize(i) = ws.GetRequestedBatchSize(0)
), the value is set uniformly batch_sizes_.resize(NumOutput(), batch_size);
so it was true anyway.
40726d2
to
916bc82
Compare
@@ -72,7 +72,7 @@ class TransformBaseOp : public SequenceOperator<Backend, true> { | |||
|
|||
bool SetupImpl(std::vector<OutputDesc> &output_descs, const Workspace &ws) override { | |||
has_input_ = ws.NumInput() > 0; | |||
auto curr_batch_size = has_input_ ? ws.GetInputBatchSize(0) : ws.GetRequestedBatchSize(0); | |||
auto curr_batch_size = InferBatchSizeFromInput(ws); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defeats the purpose of RequestedBatchSize. Instead, move InferBatchSize
to the executor and invoke it prior to SetBatchSize. It will be a workaround, but at least we'll have one place where we can change/improve batch size inference instead of spreading it across operators. It also allows for some smart(er) solutions or non-local batch source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the batch size calculation to the executor and keep RequestedBatchSize.
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
916bc82
to
7403586
Compare
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
def test_named_tensor_arguments(op): | ||
|
||
ops2params = { | ||
fn.permute_batch: _tensor_arg_permute_batch_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permute_batch
does have an input - although by original design the output batch size was taken from indices
. I don't think there's any special case for this op - or any other means to infer the batch size - which would mean that it takes the batch_size just from the input, not arg. input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it on purpose here as it calls GetRequestedBatchSize manually as an extra check that GetRequestedBatchSize returns the "split" batch size.
!build |
CI MESSAGE: [7159582]: BUILD STARTED |
CI MESSAGE: [7159582]: BUILD FAILED |
CI MESSAGE: [7159582]: BUILD PASSED |
* Set requested batch size based on the op tensor arguments if avialbale * Include arg inputs in the executor's check for all empty batch Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski ktokarski@nvidia.com
Category:s
New feature (non-breaking change which adds functionality)
Description:
This PR updates the way executor sets requested batch size. If there are any inputs available, executor assumes the operator expected uniform batch sizes across inputs and outputs and sets requested batch size to any tensor input it can find: either positional or named (argument input). Otherwise, if there are none avialable, the stage queue batch size is used.
Additional information:
The common patern introduced with the end2end variable batch size was to look at the regular inputs or, if there were none, resort to
GetRequestedBatchSize
. This PR moves that behaviour to the executor and includes named arguments in the search of batch before resorting to stage batch size. @klecki updated the autograph layer to reflect that by appropriate "hoisting" of the splits here #4618Affected modules and functionalities:
The regular usages of the operators should not see any difference.
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: N/A