[NOMRG] TransformsV2 questions / comments #7092

NicolasHug · 2023-01-16T14:21:35Z

Just writing down my thoughts / questions below. Some of the points are already addressed / tracked in #7082.

…formsV2

NicolasHug · 2023-01-16T16:58:12Z

As discussed offline with @pmeier , we should bring back the get_params() method for BC reason and to minimize adoption friction. The main use-case of these methods was for all inputs (e.g. images and masks) to be transformed with the same RNG. Users had to call get_params() on the class, and pass the returned value to the functional version of the class. Something like:

vision/references/segmentation/transforms.py

Lines 54 to 64 in 8985b59

    
           class RandomCrop: 
        
               def __init__(self, size): 
        
                   self.size = size 
        
               def __call__(self, image, target): 
        
                   image = pad_if_smaller(image, self.size) 
        
                   target = pad_if_smaller(target, self.size, fill=255) 
        
                   crop_params = T.RandomCrop.get_params(image, (self.size, self.size)) 
        
                   image = F.crop(image, *crop_params) 
        
                   target = F.crop(target, *crop_params) 
        
                   return image, target

In V2 this is all seamlessly supported, but in order to minimize adoption friction of the new transforms, it might be a good idea to keep those methods around (as deprecated).

Another use-case was to support transforming different batches (available at different times) with the same RNG as in #3001 (comment). This is something that should be supported in the future in a much better way, once there is a more fine-grained RNG support #7027. For now, this can still be supported if we bring back the get_params() methods.

pmeier · 2023-01-16T20:44:12Z

Re get_params: we have 9 transforms currently in v1 that follow that have such a static method:

I'm going to send a draft PR as a basis for discussion of how I envision that we could reinstate them.

pmeier · 2023-01-16T21:19:29Z

torchvision/prototype/datapoints/_bounding_box.py

-    format: BoundingBoxFormat
+    format: BoundingBoxFormat  # TODO: do not use a builtin?
+    # TODO: This is the size of the image, not the box. Maybe make this explicit in the name?
+    # Note: if this isn't user-facing, the TODO is not critical at all


It is user facing. In general, the metadata of the datapoints is considered public.

spatial_size was renamed from image_size once we added support to Videos if I recall correctly. #6736

But I agree it can be unclear if spatial_size refers to bbox or something else...

pmeier · 2023-01-16T21:21:19Z

torchvision/prototype/datapoints/_datapoint.py

@@ -268,4 +302,4 @@ def gaussian_blur(self, kernel_size: List[int], sigma: Optional[List[float]] = N


 InputType = Union[torch.Tensor, PIL.Image.Image, Datapoint]
-InputTypeJIT = torch.Tensor
+InputTypeJIT = torch.Tensor  # why alias it?


To have an easier time looking up what the actual type should be. Meaning if you see *JIT as annotation, you can simply look up * to see what the actual type should be. If we would use InputType and torch.Tensor, this relation is gone.

pmeier · 2023-01-16T21:24:45Z

torchvision/prototype/datapoints/_image.py

+
+    # Do we want to keep these public?
+    # This is probalby related to internal customer needs. TODO for N: figure that out
+    # This is also related to allow user-defined subclasses and transforms (in anticipation of)


In addition, resize is the most problematic one, since it actually overrides the (deprecated) tensor method with a completely different behavior.

pmeier · 2023-01-16T21:25:07Z

torchvision/prototype/datapoints/_label.py

@@ -11,6 +11,12 @@
 L = TypeVar("L", bound="_LabelBase")


+# Do we have transforms that change the categories?


pmeier · 2023-01-16T21:26:22Z

torchvision/prototype/datapoints/_label.py

@@ -11,6 +11,12 @@
 L = TypeVar("L", bound="_LabelBase")


+# Do we have transforms that change the categories?
+# Why do we need the labels to be datapoints?


Because some transformations need this information. Examples are

vision/torchvision/prototype/transforms/_augment.py

Line 127 in 8985b59

class RandomMixup(_BaseMixupCutmix):

vision/torchvision/prototype/transforms/_augment.py

Line 149 in 8985b59

class RandomCutmix(_BaseMixupCutmix):

vision/torchvision/prototype/transforms/_augment.py

Line 194 in 8985b59

class SimpleCopyPaste(Transform):

pmeier · 2023-01-16T21:55:24Z

torchvision/prototype/transforms/functional/_misc.py

+    # Saw this from [Feedback] thread (https://github.com/pytorch/vision/issues/6753#issuecomment-1308295943)
+    # Not sure I understand the need to return a Tensor instead of and Image
+    # Is the inconsistency "worth" it? How can we make sure this isn't too unexpected?


The problem here is that most of our image kernels assume that an image is in the range [0, 1.0 if image.is_floating_point() else torch.iinfo(image.dtype).max] and this assumption is violated after normalization. The image is no longer an RGB image and we felt we needed to make this explicit. Of course any ops not related to color like crop will still work as before.

As for the impact, I feel this is fine. Normalization is usually one of the last steps if not the last step before something is passed into the model. Plus, since we have the tensor to image fallback, users can still use ops like crop afterwards without issues.

Thanks for the details!

For me to better understand the trade-off, I'll try to find reasons to keep the output as Image instead of tensor. LMK what you think.

we have the tensor to image fallback

I guess that also means one can do e.g. Compose(Normalize(), SomeColorTransform()) and while Normalize will output a float tensor, SomeColorTransform() will treat it as a 0-1 image? Basically, the unwrapping may not be preventing any bug?

I'm also wondering whether this might become obsolete / overkill if we remove the ColorSpace metadata, which as discussed previously is currently redundant with num_channels?

Of course any ops not related to color like crop will still work as before.

Do we have transforms that rely on the ColorSpace meta-data right now? I could only find RandomPhotometricDistort (and it can probably do without)

I guess that also means one can do e.g. Compose(Normalize(), SomeColorTransform()) and while Normalize will output a float tensor, SomeColorTransform() will treat it as a 0-1 image? Basically, the unwrapping may not be preventing any bug?

Yes and yes. This is more of an idealistic choice. Value range checks are too expensive at runtime and this we can only assume the range. Meaning, this unwrapping (or rather the absent re-wrapping) will basically have no effect at runtime.

I'm also wondering whether this might become obsolete / overkill if we remove the ColorSpace metadata, which as discussed previously is currently redundant with num_channels?

I don't think so. datapoints.Image (and datapoints.Video) carry more implicit assumptions than just the public metadata. An Image instance implicitly communicates that its values are in the correct range, whereas torch.Tensor is completely free. Of course if one uses it as image the same implicit assumptions apply, but this comes from the context rather than the type.

Do we have transforms that rely on the ColorSpace meta-data right now? I could only find RandomPhotometricDistort (and it can probably do without)

For reference:

vision/torchvision/prototype/transforms/_color.py

Lines 127 to 128 in 93df9a5

if isinstance(inpt, (datapoints.Image, datapoints.Video)):

output = inpt.wrap_like(inpt, output, color_space=datapoints.ColorSpace.OTHER) # type: ignore[arg-type]

First of all, it is an oversight that we are using two different idioms. Both Normalize and RandomPhotometricDistort should either return a torch.Tensor or an Image with color_space=ColorSpace.OTHER.

To answer the question, yes it can do without. Apart from that, ConvertColorSpace also uses it, but we could also guess it from the channels. Although that would drop the guard that we currently have of transforming arbitrary "images", i.e. the normalized ones, to a different color space. Whether this is a good thing is probably open for discussion.

Both Normalize and RandomPhotometricDistort should either return a torch.Tensor or an Image with color_space=ColorSpace.OTHER.

I think the reason for RandomPhotometricDistort to return an Image is that adjusting hue, saturation, etc should not go outside the original color space (RGB -> RGB), however Normalize returns a tensor due to the fact that float Image is supposed to be between [0, 1] but output normalized tensor can have any kind of ranges... So, I think it is fine that they do not return the same type of object

should not go outside the original color space (RGB -> RGB),

But then why setting it to ColorSpace.OTHER instead of keeping as ColorSpace.RGB?

It seems like there are 2 different concepts:

The color-space of an image: RGB vs RGBA vs greyscale (... vs. YUV vs CMYK which aren't supported in torchvision)

The range of values, and the current assumption that it is [0, 1.0 if image.is_floating_point() else torch.iinfo(image.dtype).max]

Conceptually these are 2 different things to me. E.g. I would argue that an RGB image in [0-1] is still an RGB image even after it is normalized: the range assumption may be broken, but it's still an RGB image - just in a non-standard range. Not sure I'm convincing but:

normalize (out = (X - mean) / std) is "just" a generalization of convertImageDtype (out = X / 255), and convertImageDtype preserves the ColorSpace attribute.

passing an RGB image to normalize changes its range, but it doesn't convert it to a greyscale or RGBA or YUV / CMYK.

Right now there's a 1:1 mapping between the ColorSpace and num_channels, so unless a transform changes num_channels there's no reason it should also change the color space

So I think there's an argument to be made to return Images instead of Tensors after normalize: I agree that the assumption about the range is broken, but a) this assumption isn't encoded in the ColorSpace anyway (it's encoded in the dtype) and b) returning a Tensor doesn't prevent any more issue than returning an Image.

Nor sure if I'm too clear but happy to chat more about this!

I agree, these are colorspace and data range are two different things but there are relations between them especially when we have to convert from one colorspace to another. For example, RGB to HSV is well-defined for [0-256] and [0-1] ranges, produced HSV image has predefined ranges for 8-bit, 16-bit or float32 dtypes (ref famous opencv link).
So, I think once RGB image is normalized with mean, std we can't convert it correctly to HSV space or to Gray scale...

But then why setting it to ColorSpace.OTHER instead of keeping as ColorSpace.RGB?

Maybe, due to the fact that we do not have BGR or GRB and other permutations of RGB...

It's definitely true that these 2 concepts start to overlap when we're talking about more color-spaces like HSV / CYMK / YUV etc.

But these aren't supported in torchvision, and there is no immediate plan to support them either (I'm not saying never, but still). So at this point in time, the distinction may be somewhat premature and may lock us into a design that we might never actually need?

That's true that we do not have HSV and other colorspaces right now and I agree that we probably wont support them since tomorrow. However, we are already doing RGB to grayscale and adjusting hue/saturation by internally doing rgb to hsv conversion. I'm happy to return Image from Normalize if it helps with API adoption etc but we have to pay attention on output correctness and potential issues this step could bring...

Yes this is definitely valid, we should also make sure that returing an Image doesn't lock us either. Do we foresee any hard issue by returning an Image?

Do we foresee any hard issue by returning an Image?

I don't. Like I stated above, this was an idealistic choice in the first place. Due to our fallback, it made no difference at runtime whatsoever.

pmeier · 2023-01-16T21:58:02Z

torchvision/prototype/transforms/utils.py

+# Also this Seems like a subset of query_chw():
+# - can we just have query_chw()?
+# - if not, can we share code between the 2?


TBH I lost track of what exactly is the status here. We have quite few helpers for this and I'm not sure of the state. Worth looking into.

pmeier · 2023-01-16T22:02:45Z

torchvision/prototype/transforms/utils.py

+        # (This comment applies to the other query_() utils):
+        # Does this happen?
+        # What can go terribly wrong if we don't raise an error?
+        # Should we just document that this returns the size of the first inpt to avoid an error?


Does this happen?

Hopefully not. If it does, the input data is setup incorrectly. For example two images in the same sample of different shape, or an bounding_box.spatial_size attribute out of sync with the corresponding image.

What can go terribly wrong if we don't raise an error?

The first problem is that we needed to decide which value we return. sizes.pop() is not deterministic, so that would be quite bad.

Even if we can decide on a strategy to return a value, best case scenario is that the transform that requested the size fails loudly, albeit with a less expressive error message, when trying to transform an item that doesn't match the returned value. Worst case, the transform silently passes on bad input data.

The first problem is that we needed to decide which value we return. sizes.pop() is not deterministic, so that would be quite bad.

If we decide to not check for duplicates we don't need to create a set, so we don't need to pop() and we can return the size of the first input which is deterministic.

My first instinct would be to be defensive and check for duplicates as done in the current code. I wonder however if there's value in applying the "define errors out of existence" principle here. As you mentioned, the main risk is that the transforms silently returns wrong results - although it's not clear to me yet in which specific scenario that would happen.

(I'm not advocating for either option at this point, I'm merely trying to figure out the tradeoffs)

Do we have a sense of the performance hit incurred by the duplicate checks? If this gets called on large-ish batches (256 samples) with multiple inputs (images, masks, bbox, etc.), does it become noticeable?

Do we have a sense of the performance hit incurred by the duplicate checks?

Nope. We only tried and benchmarked on our references and there we didn't notice anything.

Let's keep it this way then, thanks

pmeier · 2023-01-16T22:03:06Z

torchvision/prototype/transforms/utils.py

        raise ValueError(f"Found multiple HxW dimensions in the sample: {sequence_to_str(sorted(sizes))}")
    h, w = sizes.pop()
    return h, w


+# when can types_or_checks be a callable?


is_simple_tensor was the motivating example here.

pmeier · 2023-01-16T22:03:50Z

torchvision/prototype/transforms/utils.py

        if isinstance(obj, type_or_check) if isinstance(type_or_check, type) else type_or_check(obj):
            return True
    return False


+# Are these public because they are needed for users to implement custom transforms?


They are not strictly needed, but make the input checking a lot easier. If users come and look at our source how we do it, we should give them the same tools.

vfdev-5 · 2023-01-19T15:48:30Z

torchvision/prototype/transforms/_augment.py

@@ -94,7 +94,7 @@ def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
    def _transform(
        self, inpt: Union[datapoints.ImageType, datapoints.VideoType], params: Dict[str, Any]
    ) -> Union[datapoints.ImageType, datapoints.VideoType]:
-        if params["v"] is not None:
+        if params["v"] is not None:  # What is this?


It's a value tensor or None used to erase the image

Basically it is the replacement that is put in the "erased" area. In v1, in case we didn't find an area to erase, we return the bounding box of the whole image as well as the image

vision/torchvision/transforms/transforms.py

Line 1702 in 01d138d

return 0, 0, img_h, img_w, img

With that we call F.erase unconditionally, which ultimately leads to replacing every value in the original image with itself:

vision/torchvision/transforms/functional_tensor.py

Lines 928 to 932 in 01d138d

if not inplace:

img = img.clone()

img[..., i : i + h, j : j + w] = v

return img

Since that is quite nonsensical, we opted to also allow None as a return value and use it as a sentinel to do nothing. I think the previous implementation came from a time were JIT didn't support Union (or Optional for that matter) and thus we couldn't return Optional[torch.Tensor].

pmeier · 2023-01-20T09:12:59Z

test/test_prototype_datapoints.py

+    # is "feature" now "datapoint" - or is it something else?
+    # A: yes, TODO: update name


pmeier · 2023-01-31T14:46:48Z

torchvision/prototype/datapoints/_datapoint.py

@@ -41,13 +43,17 @@ def __new__(
        tensor = cls._to_tensor(data, dtype=dtype, device=device, requires_grad=requires_grad)
        return tensor.as_subclass(Datapoint)

+    # Is this still needed, considering we won't be releasing the prototype datasets anytime soon?


torchvision/prototype/transforms/Migration.md

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

NicolasHug · 2023-02-10T15:55:35Z

Closing - we'll keep track of the progress in other places like #7217 (possibly coming back to this PR)

NicolasHug added 6 commits January 11, 2023 17:33

First round

40d7c83

More Qs

9edfd42

TODOs

e09d85b

More

7fbab6c

Merge branch 'main' of github.com:pytorch/vision into questions_trans…

c7accb6

…formsV2

Some more

7aa77b7

facebook-github-bot added the cla signed label Jan 16, 2023

Some more + some answers

a2bce37

This was referenced Jan 16, 2023

Transforms V2 proposal: Enabling reproducible workflows via local RNGs #7027

Open

[PoC] reinstate get_params #7095

Closed

pmeier reviewed Jan 16, 2023

View reviewed changes

pmeier mentioned this pull request Jan 17, 2023

Rollout planning for transforms v2 #7097

Open

NicolasHug added 2 commits January 18, 2023 16:01

more

a20e581

Some more

03ef398

vfdev-5 reviewed Jan 19, 2023

View reviewed changes

NicolasHug mentioned this pull request Jan 19, 2023

Let Normalize() and RandomPhotometricDistort return datapoints instead of tensors #7113

Merged

pmeier reviewed Jan 20, 2023

View reviewed changes

Add more deets to Migration doc

d4b1a3e

pmeier reviewed Jan 31, 2023

View reviewed changes

NicolasHug added 2 commits February 1, 2023 13:31

Update migration TODOs

91cf82e

update migration

c2db00a

pmeier reviewed Feb 1, 2023

View reviewed changes

torchvision/prototype/transforms/Migration.md Show resolved Hide resolved

bikeshed

a0f2b80

NicolasHug mentioned this pull request Feb 9, 2023

Compatibility layer between stable datasets and prototype transforms #6663

Merged

Some more

385af4b

pmeier reviewed Feb 9, 2023

View reviewed changes

torchvision/prototype/transforms/Migration.md Outdated Show resolved Hide resolved

Update torchvision/prototype/transforms/Migration.md

0af3f37

Co-authored-by: Philip Meier <github.pmeier@posteo.de>

NicolasHug mentioned this pull request Feb 10, 2023

TODOs before 0.15 release #7217

Closed

49 tasks

NicolasHug closed this Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOMRG] TransformsV2 questions / comments #7092

[NOMRG] TransformsV2 questions / comments #7092

NicolasHug commented Jan 16, 2023

NicolasHug commented Jan 16, 2023

pmeier commented Jan 16, 2023

pmeier Jan 16, 2023

vfdev-5 Jan 17, 2023 •

edited

Loading

pmeier Jan 16, 2023

pmeier Jan 16, 2023

pmeier Jan 16, 2023

pmeier Jan 16, 2023

pmeier Jan 16, 2023

NicolasHug Jan 18, 2023

pmeier Jan 19, 2023

vfdev-5 Jan 19, 2023 •

edited

Loading

NicolasHug Jan 19, 2023 •

edited

Loading

vfdev-5 Jan 19, 2023 •

edited

Loading

NicolasHug Jan 19, 2023

vfdev-5 Jan 19, 2023 •

edited

Loading

NicolasHug Jan 19, 2023

pmeier Jan 19, 2023

pmeier Jan 16, 2023

pmeier Jan 16, 2023

NicolasHug Jan 19, 2023

pmeier Jan 19, 2023

NicolasHug Jan 19, 2023

pmeier Jan 16, 2023

pmeier Jan 16, 2023

vfdev-5 Jan 19, 2023

pmeier Jan 20, 2023

pmeier Jan 20, 2023

pmeier Jan 31, 2023

NicolasHug commented Feb 10, 2023

		@@ -11,6 +11,12 @@
		L = TypeVar("L", bound="_LabelBase")


		# Do we have transforms that change the categories?

	if isinstance(inpt, (datapoints.Image, datapoints.Video)):
	output = inpt.wrap_like(inpt, output, color_space=datapoints.ColorSpace.OTHER) # type: ignore[arg-type]

	if not inplace:
	img = img.clone()

	img[..., i : i + h, j : j + w] = v
	return img

		# is "feature" now "datapoint" - or is it something else?
		# A: yes, TODO: update name

[NOMRG] TransformsV2 questions / comments #7092

[NOMRG] TransformsV2 questions / comments #7092

Conversation

NicolasHug commented Jan 16, 2023

NicolasHug commented Jan 16, 2023

pmeier commented Jan 16, 2023

Choose a reason for hiding this comment

vfdev-5 Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfdev-5 Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

NicolasHug Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

vfdev-5 Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vfdev-5 Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Feb 10, 2023

vfdev-5 Jan 17, 2023 •

edited

Loading

vfdev-5 Jan 19, 2023 •

edited

Loading

NicolasHug Jan 19, 2023 •

edited

Loading

vfdev-5 Jan 19, 2023 •

edited

Loading

vfdev-5 Jan 19, 2023 •

edited

Loading