-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inflate operator #4366
Add inflate operator #4366
Conversation
986c5bb
to
c14eb76
Compare
CI MESSAGE: [6233278]: BUILD STARTED |
CI MESSAGE: [6233972]: BUILD STARTED |
CI MESSAGE: [6234125]: BUILD STARTED |
CI MESSAGE: [6234125]: BUILD FAILED |
CI MESSAGE: [6240852]: BUILD STARTED |
CI MESSAGE: [6240852]: BUILD FAILED |
CI MESSAGE: [6242387]: BUILD STARTED |
CI MESSAGE: [6242387]: BUILD FAILED |
CI MESSAGE: [6243748]: BUILD STARTED |
CI MESSAGE: [6243748]: BUILD FAILED |
CI MESSAGE: [6251418]: BUILD STARTED |
CI MESSAGE: [6251418]: BUILD FAILED |
.NumOutput(1) | ||
.AddArg(inflate::shapeArgName, "The shape of the output (inflated) chunk.", DALI_INT_VEC, true) | ||
.AddOptionalTypeArg(inflate::dTypeArgName, "The output (inflated) data type.", DALI_NO_TYPE) | ||
.AddOptionalArg(inflate::offsetArgName, R"code("A list of offsets within the input sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a matter of taste, but I don't like these constants - they make the code only infinitesimally "harder" but much harder to read.
CI MESSAGE: [6362583]: BUILD STARTED |
If the sample is comprised of multiple chunks, the ``chunks_offsets`` or ``chunks_sizes`` | ||
must be specified. In that case, the ``shape`` must describe the shape of a single inflated | ||
(output) chunk. The number of the chunks will automatically be added as the leftmost extent | ||
to the output tensors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be useful to extend this API in the future with the alternative, where you provide the number of chunks instead of the offset. In same cases I imagine it would be easier to know it.
@has_operator("experimental.inflate") | ||
def test_sample_inflate(): | ||
seed = 42 | ||
for batch_size in [1, 8, 64, 256, 348]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these batch sizes in any way significant to the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really tbh, 1 is "special case", 8 is small, 348 big and the remaining 2 medium ones. I though there may be some issues that would show up for small or big batch sizes specifically. But maybe there's too much cases.
yield _test_validation, pipeline_2d_shape, "The shape argument must be a scalar or a 1D tensor" | ||
yield _test_validation, pipeline_non_elementary_dtype, \ | ||
"The inflate output type must have floating point or integral type" | ||
yield _test_validation, pipeline_input_float, "Got tensor of type `float` instead" | ||
yield _test_validation, pipeline_input_scalar, "Got input with 0 dimensions instead" | ||
yield _test_validation, pipeline_input_algorithm, \ | ||
"Unknown inflate algorithm was specified for `algorithm` argument" | ||
yield _test_validation, pipeline_too_big_chunk, "Input chunk size cannot exceed the sample size" | ||
yield _test_validation, pipeline_too_big_chunks, \ | ||
"The sum of chunk sizes for sample of idx 0 exceeds the total size of the sample." | ||
yield _test_validation, pipeline_empty_chunk, "Got chunk size 0 for sample of idx 0" | ||
yield _test_validation, pipeline_neg_chunk, "Got chunk size -1 for sample of idx 0" | ||
yield _test_validation, pipeline_too_big_offsets, \ | ||
"Got chunk offset 5 while the sample size is 5 for sample of idx 0" | ||
yield _test_validation, pipeline_too_zero_size_inferred, \ | ||
"The inferred size of a chunk would be non-positive for sample of idx 0" | ||
yield _test_validation, pipeline_sizes_offsets_mismatched, \ | ||
"for sample of idx 0 there are 2 offsets and 3 sizes" | ||
yield _test_validation, pipeline_negative_offset, \ | ||
"Input chunks offsets must be non-negative" | ||
yield _test_validation, pipeline_chunk_exceeding_sample, \ | ||
"Input chunk cannot exceed the sample size" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, please do not use tests yielding because it is impossible to properly discover them and run them in parallel in nose2
. If you need to have parameterized tests you can look into: https://docs.nose2.io/en/latest/params.html
CI MESSAGE: [6362583]: BUILD PASSED |
|
||
Each input sample can either be a single compressed chunk or consist of multiple | ||
compressed chunks that have the same shape and type when inflated, so that they can be | ||
be merged into a single tensor where the leftmost extent of the tensor corresponds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be merged into a single tensor where the leftmost extent of the tensor corresponds | |
be merged into a single tensor where the outermost extent of the tensor corresponds |
...but much more importantly - what does it even mean?
Are the chunks visible in the output shape? Shouldn't chunking be just an internal detail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real use case it targets are 3D sequences of single channel HW images. Each image is compressed separately and the results are concatenated. Using chunks_offsets
or chunks_sizes
you can do the reverse: uncompress the chunks into a sequence. So compressed chunks translate to "frames" in the tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outermost
still applies - we never use leftmost
in this context (the word appears 6 times in our code base and refers to position within image/array, not to dimensions). There are 67 occurrences of outermost
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've resolved that disccusion by misclick and couldn't find this anywhere. :p
Anyway, renamed it.
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Rebasing on the CI fixes |
CI MESSAGE: [6432468]: BUILD STARTED |
CI MESSAGE: [6432469]: BUILD FAILED |
CI MESSAGE: [6432477]: BUILD STARTED |
CI MESSAGE: [6432477]: BUILD PASSED |
CI MESSAGE: [6432468]: BUILD FAILED |
CI MESSAGE: [6432468]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
Adds inflate operator that allows to decompress batch of compressed samples (and batch of sequences of compressed samples). For now, only LZ4 compressin is supported.
The decompression is done with nvCOMP library. Note, nvCOMP is not open source, the nvCOMP SDK licencse is included in the Acknwoledgment: #4368.
If the samples contain multiple compressed chunks, either chunk_sizes or chunk_offsets parameter must be provided. If both are provided, the order of the chunks in the input and output can be different.
Additional information:
For now, we build with the nvCOMP (and thus without the operator) only for x86 due to incompatible glibcxx in the nvCOMP's aarch build.
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: DECOMP.02-08.
JIRA TASK: DALI-3049