You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.
In earlier iterations of the code, we supported more input types (like flattened features right away). Having a data type that handled different types of inputs (either list of images or list of sequences) and computed the padded output (with masking) was helpful.
The value of NestedTensor is to provide a unified way of gathering a list of Tensors and batching them together, returning both the batched input as well as the mask containing the padded regions. If we were to iterate on individual images at a time, we wouldn't be able to exploit parallelization across batches, making things slower.
It is possible to remove NestedTensor from the implementation and return instead the padded tensor and mask right away, but this type of abstraction was already useful in the past, see ImageList in torchvision, or ImageList ins detectron2 or in maskrcnn-benchmark where it was first introduced, which was used for a very similar purpose of batching together a list of arbitrarily-sized images, so we decided to keep using it here as well.
Plus, with ongoing work on developing optimized primitives for NestedTensor, see https://github.com/pytorch/nestedtensor (still work in progress and highly experimental), we could potentially see memory / speed benefits for using such representations in the future.
I believe I have answered your question, and as such I'm closing the issue but let us know if you have further questions.
I'm wondering why you used nested tensor data type, instead of a simple list?
The text was updated successfully, but these errors were encountered: