compatibility layer between stable datasets and prototype transforms? #6662

pmeier · 2022-09-28T15:22:12Z

The original plan was to roll out the datasets and transforms revamp at the same time since they somewhat depend on each other. However, it is becoming more and more likely that the prototype transforms will be finished sooner. Thus, we need some compatibility layer in the meantime. This issue explains how transforms are currently used with the datasets, what will or will not work without a compatibility layer, and how such a compatibility layer might look like.

Status quo

Most of our datasets support the transform and target_transform idiom. These transformations are applied separately to the first and second item of the raw sample returned by the dataset. For classification tasks this usually sufficient although I've never seen a practical use for target_transform:

vision/references/classification/train.py

Lines 129 to 137 in 0fcfaa1

    
           dataset = torchvision.datasets.ImageFolder( 
        
               traindir, 
        
               presets.ClassificationPresetTrain( 
        
                   crop_size=train_crop_size, 
        
                   interpolation=interpolation, 
        
                   auto_augment_policy=auto_augment_policy, 
        
                   random_erase_prob=random_erase_prob, 
        
               ), 
        
           )

However, the separation of the transforms breaks down in case image and label need to be transformed at the same time, e.g. CutMix or MixUp. They are currently applied through a custom collation function for the dataloader:

vision/references/classification/train.py

Lines 209 to 210 in 3c9ae0a

    
           mixupcutmix = torchvision.transforms.RandomChoice(mixup_transforms) 
        
           collate_fn = lambda batch: mixupcutmix(*default_collate(batch))  # noqa: E731

Since these transforms do not work with the standard idioms, they never made it out of our references into the library.

The need to transform input and target at the same time is not a special case for other tasks such as segmentation or detection. Datasets for these tasks support the transforms parameter. It will be called with the complete sample and thus is able to support all use cases.

Since even datasets for the same task have very diverse outputs, there were only two options without revamping the APIs completely:

Unify the datasets outputs on the dataset itself.
Unify the datasets outputs through a compatibility layer.

When this first came up in the past, we went with option 2. On our references we unified the output for a few select datasets for a specific task, so we can apply custom joint transformations to them. Since we didn't want to commit to the interface, neither the minimal compatibility layer nor the transformations made it into the library. Thus, although some of our datasets in theory support joint transformations, the users have to implement them themselves.

Do we need a compatibility layer?

The new transformations support the joint use case out of the box. Meaning, all the custom transformations from our references are now part of the library. Plus, all transformations that previously only supported images, e.g. resizing or padding, now also support bounding boxes, masks and so on.

The information which part of the sample is what kind of type is not communicated through the sample structure, i.e. first element is an image and second one is a mask, but rather through the actual type of the object. We introduced several tensor subclasses that will be rolled out together with the transforms.

By treating simple tensors, i.e. not the new subclasses, as images, the new transformations are full BC¹. Thus, if you previously only used the separated transform and target_transform idiom you can continue to do that and the new transforms will not get into your way:

import torch
from torchvision import datasets
from torchvision.prototype import transforms

transform = transforms.Compose(
    [
        transforms.PILToTensor(),
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.ImageNet(..., transform=transform)

image, label = dataset[0]
assert isinstance(image, torch.Tensor)
assert image.shape[-2:] == (224, 224)
assert isinstance(label, int)

The transforms also work out of the box if you want to stick to PIL images:

import PIL.Image
from torchvision import datasets
from torchvision.prototype import transforms

transform = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.ImageNet(..., transform=transform)

image, label = dataset[0]
assert isinstance(image, PIL.Image.Image)
assert image.size == (224, 224)
assert isinstance(label, int)

Although it seems the new transforms can also be used out of the box if the dataset supports the transforms parameter, this unfortunately not the case. While the new datasets will provide the sample parts wrapped into the new tensor subclasses, the old datasets, i.e. the only ones available during the roll-out of the new transforms, do not.

Without the wrapping, the transform does not pick up on bounding boxes and subsequently does not transform them:

import torch
import PIL.Image
from torchvision import datasets
from torchvision.prototype import transforms

transform = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.CocoDetection(..., transforms=transform)

image, target = dataset[0]
assert isinstance(image, PIL.Image.Image)
assert image.size == (224, 224)

assert len(target) == 8

bbox = target[2]["bbox"]
# bounding boxes were not downsized and thus are now out of sync with the image
torch.testing.assert_close([int(coord) for coord in target[2]["bbox"]], [249, 229, 316, 245])

segmentation = target[2]["segmentation"]
# masks were not downsized and thus are now out of sync with the image. Plus, they still encoded and the user has to
# decode them themselves
assert isinstance(segmentation, list) and all(isinstance(item, (int, float)) for item in segmentation)

Masks will be transformed, but without wrapping they will be treated as normal images. This means, by default InterpolationMode.BILINEAR is used for interpolation, which will corrupt the information:

import torch
from torchvision import datasets
from torchvision.prototype import transforms

transform = transforms.Compose(
    [
        transforms.PILToTensor(),
        # we convert to float here to make the bilinear interpolation visible
        transforms.ConvertImageDtype(torch.float64),
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.VOCSegmentation(..., transforms=transform)

image, mask = dataset[0]
assert isinstance(image, torch.Tensor)
assert image.shape[-2:] == (224, 224)
assert isinstance(mask, torch.Tensor)
assert mask.shape[-2:] == (224, 224)
# If the interpolation worked correctly, we would only see integer values in the uint8 range of [0, 255]
assert torch.any(torch.fmod(mask * 255, 1) > 0)

Thus, if we don't provide a compatibility layer until our datasets wrap automatically, the prototype transforms don't bring any real benefit to the user of our datasets.

Proposal

I propose to provide a thin wrapper for the datasets that does nothing else than wrapping the returned samples into the new tensor subclasses. This means, that the new object behaves exactly as the dataset as before, but upon accessing an element, i.e. calling __getitem__, we wrap the samples before passing them into the transforms.

from torchvision import datasets
from torchvision.prototype import transforms, features

transform = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.ImageNet(..., transform=transform)
dataset = features.VisionDatasetFeatureWrapper.from_torchvision_dataset(dataset)

image, label = dataset[0]
assert isinstance(image, features.Image)
assert image.image_size == (224, 224)
assert isinstance(label, features.Label)
assert label.to_categories() == "tench, Tinca tinca"

Going back to the segmentation example from above, with the wrapper in place the segmentation mask is now correctly
interpolated with InterpolationMode.NEAREST:

import torch
from torchvision import datasets
from torchvision.prototype import transforms, features

transform = transforms.Compose(
    [
        # we convert to float here to make the bilinear interpolation visible
        transforms.ToDtype(torch.float64, features.Mask),
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.VOCSegmentation(..., transforms=transform)
dataset = features.VisionDatasetFeatureWrapper.from_torchvision_dataset(dataset)

image, mask = dataset[0]
assert isinstance(mask, torch.Tensor)
assert mask.shape[-2:] == (224, 224)
assert not torch.any(torch.fmod(mask * 255, 1) > 0)

In general, the wrapper should not change the structure of the sample unless it is necessary to be able to properly use
the new transformations. For example, the target of COCODetection is a list of dictionaries, in which each
dictionary holds the information for one object. Our models however require a dictionary where the value of the
bounding box key is a (N, 4) tensor, where N is the number of objects. Furthermore, while our basic transform can
work with individual bounding boxes, more elaborate ones that we ported from the reference scripts also require this
format.

Thus, if needed, we also perform this collation inside the wrapper:

import torch
from torchvision import datasets
from torchvision.prototype import transforms, features

transform = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
    ]
)
dataset = datasets.CocoDetection(..., transforms=transform)
dataset = features.VisionDatasetFeatureWrapper.from_torchvision_dataset(dataset)

image, target = dataset[0]

assert isinstance(image, features.Image)
assert image.shape[-2:] == (224, 224)

bbox = target["bbox"]
assert isinstance(bbox, features.BoundingBox)
assert bbox.shape == (8, 4)
torch.testing.assert_close(bbox[2].int().tolist(), [116, 106, 152, 114])

Furthermore, if the data is in an encoded state, like the masks the COCODetection provides, will be decoded so they can be used directly by the transforms and models:

segmentation = target["segmentation"]
assert isinstance(segmentation, features.Mask)
assert segmentation.shape == (8, 224, 224)

The VisionDatasetFeatureWrapper class in the examples above is implemented as a proof of concept in #6663.

Conclusion

If we don't roll out the new datasets at the same time as the new transformations, the transformations on their own will bring little value to the user. Their whole power can only be unleashed if we add a thin compatibility layer between them and the "old" datasets. I've proposed an, IMO clean, implementation for such a compatibility layer.

cc @vfdev-5 @datumbox @bjuncek

Fully BC for what is discussed here. The only thing that will be BC breaking is that the new transforms will no longer be torch.jit.script'able whereas they were before. ↩

The text was updated successfully, but these errors were encountered:

NicolasHug · 2022-09-28T15:53:06Z

Thanks for the issue Philip. What is the timeline here? Do we we expect the V2 transforms to be out of prototype by 0.14? Because if not, there's a chance that (at least some) new datasets will be ready by 0.15. If we're sure we'll move the transforms away from prototype before the datasets, then we should also be thinking about

a) what will users need to do when the new datasets become available. Do they remove the wrapper? Ideally they would only need to chance their code once, not twice
b) what happens if we never end up releasing the new datasets?

pmeier · 2022-09-29T07:21:22Z

What is the timeline here? Do we we expect the V2 transforms to be out of prototype by 0.14? Because if not, there's a chance that (at least some) new datasets will be ready by 0.15.

I think 0.14 is unlikely given that it is right around the corner, so my guess is 0.15. But I'll let @datumbox comment on that. And indeed, if we roll out together, this discussion is moot.

a) what will users need to do when the new datasets become available. Do they remove the wrapper? Ideally they would only need to chance their code once, not twice

My points below assume that the users actually want to use the features of transforms V2. As explained in my top comment, they are BC and so users don't have to use the proposed compatibility layer if they just want to continue doing what they were doing before and don't have to change anything.

That depends on how we are releasing the datasets V2:

The original plan was to load them through a function by their name. This would allow us to keep the classes that build the datasets private and in turn allow the V1 and V2 API exist in the same namespace and thus keeping BC. If we go that route, the users have to change once to use the wrappers and once again when the datasets V1 are removed.
Some time ago we changed this to also make the new classes public. Meaning, they will replace the old classes and it will be a hard BC break. Thus, the users will have to change once to use the wrappers and one more time to use the new datasets.

If we don't roll-out at the same time, but want to actually push the new transforms from the time they are no longer prototypes, users probably have to change twice. Depending on if we want to deprecate / remove datasets V1 at all (I remember there was some offline discussion to just keep them around, but not maintain them any longer), users could also get away with one change if they just don't use the datasets V2.

b) what happens if we never end up releasing the new datasets?

I think what we currently call datasets V2 bundles multiple things:

Switching from map-style to iter-style datasets using torchdata
Changing the return type from tuples to dictionaries while also returning more than just the bare minimum
Wrapping the returned data into the new tensor subclasses

Each of these points can somewhat stand on their own. Still, each point is BC breaking, which is why we wanted to release them at once to avoid multiple BC breaks in subsequent versions.

If we decide to walk back on datasets V2 in its current state, we need to decide if we keep parts of it. In some form we need 3. to unleash the power of transforms V2. We could

permanently use a compatibility layer as proposed in this issue. This would keep full BC for datasets V1 and users can opt in if they want to use them with transforms V2. Of course this will mean a worse UX, since users now need to wrap the dataset instead of that happening automatically.
BC break the datasets V1 and wrap the output types in the new tensor subclasses. Note that we don't need go for the dictionary output (2. from above) so this will be not as hard as going for datasets V2 completely.

datumbox · 2022-09-29T08:09:59Z

I think 0.14 is unlikely given that it is right around the corner, so my guess is 0.15. But I'll let @datumbox comment on that. And indeed, if we roll out together, this discussion is moot.

I can confirm that there is no plan releasing Transforms V2 in 4 weeks. We are pretty much in active development and benchmarking. The API will remain in prototype and we can explore a path to release in Q1. Some parts of the API such as the functional could be released first as they are now fully-BC & JIT-scriptable but the classes aren't, so we need to be very careful on how we roll them out.

datumbox · 2022-09-29T08:41:39Z

I think the option that @pmeier mentions is viable. Whether or not we will implement it will require a lot of discussion because ideally we would like the new Datasets and Transforms to be rolled out together. So any move that doesn't do that, will hinter the adoption of the solutions and in my eyes is more of a nuclear option.

One alternative workaround for unleashing the power of Transforms if Datasets aren't ready but without massive BC issues is the following. We could create a new FeatureWrapper Transform class that can be configured on the constructor to receive a dictionary that describes how to grab the input from the dataset (name or location in the input etc) and map it to the appropriate _Feature type. This is not perfect as we miss out on meta-data such as the Colour space, the Label categories etc. But it is also self-contained within the new Transforms V2 and pretty much is just a generic solution for what we already do at #6433 to test the transforms.

pmeier · 2022-09-29T09:05:59Z

We could create a new FeatureWrapper Transform class that can be configured on the constructor to receive a dictionary that describes how to grab the input from the dataset (name or location in the input etc) and map it to the appropriate _Feature type.

This was my first thought as well but thinking about it more this was more complicated than wrapping the dataset:

To be able to provide a convenient interface for the users, they need to be able to get the right wrapper transform without manually specifying how the sample needs to be wrapped. Without this, more complicated datasets like COCO will be a pain for users to configure. Have a look at Compatibility layer between stable datasets and prototype transforms #6663 and how much dense logic is needed to wrap the COCODetection samples.

To be able to provide the wrapping for the user, we need to know what dataset they are using. However the transform needs to be created before the dataset since it will be passed to the constructor and thus resulting in a chicken and egg problem. We can't provide the correct wrapping transform on the dataset class alone, since some datasets change their output type based on some input arguments. Thus, in general, we would need the dataset class as well as all parameters to perform the correct wrapping.
The datasets take either transform, target_transform, or transforms and so we would have to provide three different wrapping transforms.

pmeier · 2022-10-12T08:54:26Z

After a longer offline discussion with @datumbox, we agreed it would be beneficial to add a wrapper transform, that needs to be manually specified, in addition to the here proposed dataset wrapper. That can help in the following two use cases:

Users defined their own datasets, but still want to conveniently use the new transforms.
Users use our datasets, but already have some logic in place that brings the data in the right shape and thus only need to wrap the plain tensors into the new subclasses.

The VisionDatasetFeatureWrapper from the PoC implementation in #6663 already supports this, but it would still wrap a dataset. Since we need to specify the wrapping manually anyway, the wrapping can also happen on the transform level and thus not touching the datasets at all.

There are a few new questions that we need to answer now. For illustration purposes, I'm going to use the following detection sample:

sample = (
    torch.rand(3, 512, 512),
    dict(
        area=0.0,
        labels=torch.randint(0, 10, (8,)),
        boxes=torch.rand(8, 4),
    ),
)

How should users specify how the wrapping should take place. I came up with two possible variants:
1. Mirror the sample structure with the wrapper definition:
```
wrappers = ( 
    image_wrapper,
    dict(
        labels=label_wrapper,
        boxes=bounding_box_wrapper,
    ),
)
```
This is what the PoC implementation in Compatibility layer between stable datasets and prototype transforms #6663 does for now.
1. Specify the indices the wrappers should be applied to
```
wrappers = (
    (0, image_wrapper),
    ((1, "labels"), label_wrapper),
    ((1, "boxes"), bounding_box_wrapper),
)
```
Do users always need to provide a complete wrapper specification or should we assume that everything not specified will not be wrapped? In 1. above I made that assumption and this is why the area is not handled. If we decide to make this assumption, I would prefer variant ii. from above, since partially mirroring the sample structure might not always be possible. Plus, not wrapping a tensor in the input might lead to the transforms mishandling it, since it will be taken as an image. One option is to wrap this into a no-op feature and this is what the PoC implementation in Compatibility layer between stable datasets and prototype transforms #6663 already does.
How do we want to handle dependent items inside the sample? For example, the bounding_box_wrapper from above needs to know the image size, but with the setup proposed above, it does not have access to it. One way to achieve this, is by writing a wrapper for the whole sample:
```
def sample_wrapper(sample):
    image, target = sample

    wrapped_image = image_wrapper(image)
    image_size = wrapped_image[-2:]

    target["labels"] = label_wrapper(target["labels"])
    target["boxes"] = bounding_box_wrapper(target["boxes"], image_size=image_size)

    return wrapped_image, target

# Variant i.
wrappers_mirror = sample_wrapper

# Variant ii.
wrappers_indices = [((), sample_wrapper)]
```
This means, users will need to write the wrapper manually, and thus they are not able to use the "syntax sugar" we introduced in 1. to specify the transformation. Unfortunately, the datasets that benefit the most from transforms v2 fall into this category. Since users still have access to the building blocks (image_wrapper, ...) this is of source still easier than writing the wrapper from scratch.

datumbox · 2022-10-12T09:35:25Z

Good call out for the bounding_box_wrapper use-case. Thanks for raising this. Isn't this counter example a deal breaker? I mean, one can definitely provide a sample_wrapper and individual *_wrapper for each type but what's the benefit of doing this versus just writing their own transform? The whole idea was to provide something fast and easy for them to reduce the amount of code they have to write for the 2 use-cases you described. But if they need to write a custom implementation for that, then it seems to me the Lambda layer can do exactly that. Am I missing something?

NicolasHug · 2023-01-25T16:26:05Z

Some quick updates on that after syncing with @pmeier:

The new datasets (mentioned here compatibility layer between stable datasets and prototype transforms? #6662 (comment)) won't happen anytime soon, so the compatibility layer discussed in this issue is a must have for the release of transformsV2.
The conclusion seems to be that providing a FeatureWrapper transforms for user-defined datasets as proposed in compatibility layer between stable datasets and prototype transforms? #6662 (comment) is probably not worth it compatibility layer between stable datasets and prototype transforms? #6662 (comment)
On top of a DatasetWrapper class as currently proposed in Compatibility layer between stable datasets and prototype transforms #6663 (or a function / method / parameter), another option that can be considered and that was discussed offline is a WrapperTransform() which would look like this:
```
transforms = …  # just like before
transforms = Compose([WrapperTransform(), transforms])
ds = ImageNet(transform=transforms)  # just like before
```
We originally thought that this approach would cause the same chicken-and-egg problem explained in compatibility layer between stable datasets and prototype transforms? #6662 (comment), but that's actually not the case if we make the datasets constructors aware that such transform may be the first one in a pipeline; in which case, the dataset constructor can "replace" that first WrapperTransform() with one that does the appropriate wrapping (how this would be implemented under the hood is up for discussion).
Both the dataset wrapper and the wrapper transform can become no-op on an eventual DatasetV2 design. We still need to figure out which of these 2 options could lead to the best UX.

pmeier added module: datasets module: transforms new feature prototype labels Sep 28, 2022

pmeier mentioned this issue Sep 28, 2022

Compatibility layer between stable datasets and prototype transforms #6663

Merged

pmeier mentioned this issue Nov 8, 2022

[FEEDBACK] Transforms V2 API #6753

Closed

pmeier mentioned this issue Feb 2, 2023

Current way to use torchvision.prototype.transforms #7168

Closed

pmeier linked a pull request Feb 10, 2023 that will close this issue

Compatibility layer between stable datasets and prototype transforms #6663

Merged

pmeier closed this as completed in #6663 Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compatibility layer between stable datasets and prototype transforms? #6662

compatibility layer between stable datasets and prototype transforms? #6662

pmeier commented Sep 28, 2022 •

edited

Loading

NicolasHug commented Sep 28, 2022

pmeier commented Sep 29, 2022

datumbox commented Sep 29, 2022

datumbox commented Sep 29, 2022

pmeier commented Sep 29, 2022 •

edited

Loading

pmeier commented Oct 12, 2022 •

edited

Loading

datumbox commented Oct 12, 2022

NicolasHug commented Jan 25, 2023

compatibility layer between stable datasets and prototype transforms? #6662

compatibility layer between stable datasets and prototype transforms? #6662

Comments

pmeier commented Sep 28, 2022 • edited Loading

Status quo

Do we need a compatibility layer?

Proposal

Conclusion

Footnotes

NicolasHug commented Sep 28, 2022

pmeier commented Sep 29, 2022

datumbox commented Sep 29, 2022

datumbox commented Sep 29, 2022

pmeier commented Sep 29, 2022 • edited Loading

pmeier commented Oct 12, 2022 • edited Loading

datumbox commented Oct 12, 2022

NicolasHug commented Jan 25, 2023

pmeier commented Sep 28, 2022 •

edited

Loading

pmeier commented Sep 29, 2022 •

edited

Loading

pmeier commented Oct 12, 2022 •

edited

Loading