[PoC] simplify simple tensor fallback heuristic #7340

pmeier · 2023-02-27T08:52:13Z

Since we had some internal discussions about the heuristic before and it came up again in #7331 (comment), this PR is an attempt to simplify it while adhering to the original goals. Let's start with a little bit of context:

When transforms v2 was conceived, one major design goal was to make it BC to v1. Part of that is that we need to treat simple torch.Tensor's as images and don't require users to wrap them into a datapoints.Image or similar. To achieve that the functional API internally just dispatches to the *_image_tensor kernel, e.g.

vision/torchvision/transforms/v2/functional/_geometry.py

Lines 78 to 79 in 01ef0a6

    
           if torch.jit.is_scripting() or is_simple_tensor(inpt): 
        
               return horizontal_flip_image_tensor(inpt)

By not adding any logic other than allowing simple tensors to be transformed, the transforms inherited this behavior. However, this proved detrimental for two reasons:

After the decision was made that we'll leave datapoints.Label and datapoints.OneHotLabel in the prototype area for now, we wanted to represent them as simple tensors
Some datasets like CelebA return simple tensors as part of their annotations.

To support these use cases, the initial idea was to introduce a no-op datapoint ():

class DontTouchMe(Datapoint):
    pass

This could be easily filtered out by the transforms. However this again had two issues:

Users are forced to address this situation by wrapping their simple tensor inputs that aren't images.
We increase our API surface.

To overcome this, #7170 added a heuristic that currently behaves as follows:

vision/gallery/plot_transforms_v2.py

Lines 92 to 95 in 01ef0a6

    
           # * If we find an explicit image or video (:class:`torchvision.datapoints.Image`, :class:`torchvision.datapoints.Video`, 
        
           #   or :class:`PIL.Image.Image`) in the input, all other plain tensors are passed through. 
        
           # * If there is no explicit image or video, only the first plain :class:`torch.Tensor` will be transformed as image or 
        
           #   video, while all others will be passed through.

This solves the issues above. However, it goes beyond the original goal of keeping BC: v1 does not support joint transformations and thus allowing simple tensors to act as images in a joint context is not needed for BC. And this is the part that makes the current heuristic more complicated than it has to be since it introduces stuff like order into the picture.

The heuristic this PR proposed goes a more pragmatic approach regarding BC:

If the input is a single simple tensor, it will be transformed as image as it was done in v1.
Otherwise, simple tensors will not be transformed, but rather be passed through.

The only thing we are losing by going for this simplification is the ability to intentionally use simple tensors as images in a joint setting. IMHO, it isn't a big ask of users to wrap into a datapoints.Image there, since they will have to wrap into datapoints.Mask's and datapoints.BoundingBox'es anyway in a joint context.

pmeier

I intentionally only touched the Transform base class for now. Of course we also need to add these changes to other special transforms that overwrite forward.

pmeier · 2023-02-27T08:53:53Z

torchvision/transforms/v2/_transform.py


        flat_outputs = [
-            self._transform(inpt, params) if needs_transform else inpt
-            for (inpt, needs_transform) in zip(flat_inputs, needs_transform_list)
+            self._transform(inpt, params) if isinstance(inpt, self._transformed_types) else inpt for inpt in flat_inputs


Since we no longer care about simple tensors here, we can use the regular isinstance and can potentially eliminate

vision/torchvision/prototype/transforms/utils.py

Line 52 in c73411a

def check_type(obj: Any, types_or_checks: Tuple[Union[Type, Callable[[Any], bool]], ...]) -> bool:

NicolasHug · 2023-03-09T11:15:37Z

Thanks for the proposal @pmeier and for trying to simplify the heuristic.

IIUC this proposal makes this use-case invalid now:

transform(some_pure_tensor, bboxes, masks)

I agree this isn't something that was supported before and so BC requirements are lighter here. However, so far we've tried to hard to keep a strong equivalent between pure tensors and images (as e.g. reflected in the dispatchers or lower-level kernels). It makes the mental model simpler (although it seems to make our code harder, that's true).

I don't really know how I feel about this right now. On one hand it seems to simplify our code, but I'm not sure it simplifies the overall mental model for users. The current heuristic is somewhat tricky but it is irrelevant for 99% of users; but in contrast, the one introduced in this PR would have to be disclosed to all users. I'm also not sure how it plays out with the fact that we unwrap subclasses for any non-transform operation, which may force users to re-wrap their input manually into Images and create friction. There's also the fact that another (opposite) direction we could take right now would be to literally get rid of the Image subclass as it has no meta-data attached to it...

Hopefully we'll get more direct feedback from users about this and about the unwrapping mechanism, which will allow us to make a more informed decision?

[PoC] simplify simple tensor fallback heuristic

2544176

facebook-github-bot added the cla signed label Feb 27, 2023

pmeier commented Feb 27, 2023

View reviewed changes

NicolasHug mentioned this pull request Mar 15, 2023

[FEEDBACK] TransformsV2: What may change in the future (we need your input!) #7319

Closed

pmeier mentioned this pull request Jul 24, 2023

Add --backend and --use-v2 support for segmentation references #7743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PoC] simplify simple tensor fallback heuristic #7340

[PoC] simplify simple tensor fallback heuristic #7340

Uh oh!

pmeier commented Feb 27, 2023

Uh oh!

pmeier left a comment

Uh oh!

pmeier Feb 27, 2023

Uh oh!

NicolasHug commented Mar 9, 2023

Uh oh!

Uh oh!

	if torch.jit.is_scripting() or is_simple_tensor(inpt):
	return horizontal_flip_image_tensor(inpt)

	# * If we find an explicit image or video (:class:`torchvision.datapoints.Image`, :class:`torchvision.datapoints.Video`,
	# or :class:`PIL.Image.Image`) in the input, all other plain tensors are passed through.
	# * If there is no explicit image or video, only the first plain :class:`torch.Tensor` will be transformed as image or
	# video, while all others will be passed through.

[PoC] simplify simple tensor fallback heuristic #7340

Are you sure you want to change the base?

[PoC] simplify simple tensor fallback heuristic #7340

Uh oh!

Conversation

pmeier commented Feb 27, 2023

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier Feb 27, 2023

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Mar 9, 2023

Uh oh!

Uh oh!