-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatibility layer between stable datasets and prototype transforms? #6662
Comments
Thanks for the issue Philip. What is the timeline here? Do we we expect the V2 transforms to be out of prototype by 0.14? Because if not, there's a chance that (at least some) new datasets will be ready by 0.15. If we're sure we'll move the transforms away from a) what will users need to do when the new datasets become available. Do they remove the wrapper? Ideally they would only need to chance their code once, not twice |
I think 0.14 is unlikely given that it is right around the corner, so my guess is 0.15. But I'll let @datumbox comment on that. And indeed, if we roll out together, this discussion is moot.
My points below assume that the users actually want to use the features of transforms V2. As explained in my top comment, they are BC and so users don't have to use the proposed compatibility layer if they just want to continue doing what they were doing before and don't have to change anything. That depends on how we are releasing the datasets V2:
If we don't roll-out at the same time, but want to actually push the new transforms from the time they are no longer prototypes, users probably have to change twice. Depending on if we want to deprecate / remove datasets V1 at all (I remember there was some offline discussion to just keep them around, but not maintain them any longer), users could also get away with one change if they just don't use the datasets V2.
I think what we currently call datasets V2 bundles multiple things:
Each of these points can somewhat stand on their own. Still, each point is BC breaking, which is why we wanted to release them at once to avoid multiple BC breaks in subsequent versions. If we decide to walk back on datasets V2 in its current state, we need to decide if we keep parts of it. In some form we need 3. to unleash the power of transforms V2. We could
|
I can confirm that there is no plan releasing Transforms V2 in 4 weeks. We are pretty much in active development and benchmarking. The API will remain in prototype and we can explore a path to release in Q1. Some parts of the API such as the functional could be released first as they are now fully-BC & JIT-scriptable but the classes aren't, so we need to be very careful on how we roll them out. |
I think the option that @pmeier mentions is viable. Whether or not we will implement it will require a lot of discussion because ideally we would like the new Datasets and Transforms to be rolled out together. So any move that doesn't do that, will hinter the adoption of the solutions and in my eyes is more of a nuclear option. One alternative workaround for unleashing the power of Transforms if Datasets aren't ready but without massive BC issues is the following. We could create a new |
This was my first thought as well but thinking about it more this was more complicated than wrapping the dataset:
|
After a longer offline discussion with @datumbox, we agreed it would be beneficial to add a wrapper transform, that needs to be manually specified, in addition to the here proposed dataset wrapper. That can help in the following two use cases:
The There are a few new questions that we need to answer now. For illustration purposes, I'm going to use the following detection sample: sample = (
torch.rand(3, 512, 512),
dict(
area=0.0,
labels=torch.randint(0, 10, (8,)),
boxes=torch.rand(8, 4),
),
)
|
Good call out for the |
Some quick updates on that after syncing with @pmeier:
|
The original plan was to roll out the datasets and transforms revamp at the same time since they somewhat depend on each other. However, it is becoming more and more likely that the prototype transforms will be finished sooner. Thus, we need some compatibility layer in the meantime. This issue explains how transforms are currently used with the datasets, what will or will not work without a compatibility layer, and how such a compatibility layer might look like.
Status quo
Most of our datasets support the
transform
andtarget_transform
idiom. These transformations are applied separately to the first and second item of the raw sample returned by the dataset. For classification tasks this usually sufficient although I've never seen a practical use fortarget_transform
:vision/references/classification/train.py
Lines 129 to 137 in 0fcfaa1
However, the separation of the transforms breaks down in case image and label need to be transformed at the same time, e.g.
CutMix
orMixUp
. They are currently applied through a custom collation function for the dataloader:vision/references/classification/train.py
Lines 209 to 210 in 3c9ae0a
Since these transforms do not work with the standard idioms, they never made it out of our references into the library.
The need to transform input and target at the same time is not a special case for other tasks such as segmentation or detection. Datasets for these tasks support the
transforms
parameter. It will be called with the complete sample and thus is able to support all use cases.Since even datasets for the same task have very diverse outputs, there were only two options without revamping the APIs completely:
When this first came up in the past, we went with option 2. On our references we unified the output for a few select datasets for a specific task, so we can apply custom joint transformations to them. Since we didn't want to commit to the interface, neither the minimal compatibility layer nor the transformations made it into the library. Thus, although some of our datasets in theory support joint transformations, the users have to implement them themselves.
Do we need a compatibility layer?
The new transformations support the joint use case out of the box. Meaning, all the custom transformations from our references are now part of the library. Plus, all transformations that previously only supported images, e.g. resizing or padding, now also support bounding boxes, masks and so on.
The information which part of the sample is what kind of type is not communicated through the sample structure, i.e. first element is an image and second one is a mask, but rather through the actual type of the object. We introduced several tensor subclasses that will be rolled out together with the transforms.
By treating simple tensors, i.e. not the new subclasses, as images, the new transformations are full BC1. Thus, if you previously only used the separated
transform
andtarget_transform
idiom you can continue to do that and the new transforms will not get into your way:The transforms also work out of the box if you want to stick to PIL images:
Although it seems the new transforms can also be used out of the box if the dataset supports the
transforms
parameter, this unfortunately not the case. While the new datasets will provide the sample parts wrapped into the new tensor subclasses, the old datasets, i.e. the only ones available during the roll-out of the new transforms, do not.Without the wrapping, the transform does not pick up on bounding boxes and subsequently does not transform them:
Masks will be transformed, but without wrapping they will be treated as normal images. This means, by default
InterpolationMode.BILINEAR
is used for interpolation, which will corrupt the information:Thus, if we don't provide a compatibility layer until our datasets wrap automatically, the prototype transforms don't bring any real benefit to the user of our datasets.
Proposal
I propose to provide a thin wrapper for the datasets that does nothing else than wrapping the returned samples into the new tensor subclasses. This means, that the new object behaves exactly as the dataset as before, but upon accessing an element, i.e. calling
__getitem__
, we wrap the samples before passing them into the transforms.Going back to the segmentation example from above, with the wrapper in place the segmentation mask is now correctly
interpolated with
InterpolationMode.NEAREST
:In general, the wrapper should not change the structure of the sample unless it is necessary to be able to properly use
the new transformations. For example, the
target
ofCOCODetection
is a list of dictionaries, in which eachdictionary holds the information for one object. Our models however require a dictionary where the value of the
bounding box key is a
(N, 4)
tensor, whereN
is the number of objects. Furthermore, while our basic transform canwork with individual bounding boxes, more elaborate ones that we ported from the reference scripts also require this
format.
Thus, if needed, we also perform this collation inside the wrapper:
Furthermore, if the data is in an encoded state, like the masks the
COCODetection
provides, will be decoded so they can be used directly by the transforms and models:The
VisionDatasetFeatureWrapper
class in the examples above is implemented as a proof of concept in #6663.Conclusion
If we don't roll out the new datasets at the same time as the new transformations, the transformations on their own will bring little value to the user. Their whole power can only be unleashed if we add a thin compatibility layer between them and the "old" datasets. I've proposed an, IMO clean, implementation for such a compatibility layer.
cc @vfdev-5 @datumbox @bjuncek
Footnotes
Fully BC for what is discussed here. The only thing that will be BC breaking is that the new transforms will no longer be
torch.jit.script
'able whereas they were before. ↩The text was updated successfully, but these errors were encountered: