Add pil_to_tensor to functionals #2092

xksteven · 2020-04-10T15:00:05Z

This adds an as_tensor function as discussed in #2060 (comment).

The idea behind this function is to first convert the image into a torch.Tensor of the same dtype as the inputted image.

I do not know have an example TIFF image to test out if this issue #856 (comment) is addressed or not. Please instruct me on how to proceed. Thanks

Similar functionality to to_tensor without the default conversion to float and division by 255. Also adds support for Image mode 'L'.

Adds tests to AsTensor and removes the conversion to float and division by 255.

Calls the as_tensor function in functionals and adds the function AsTensor as callable from transforms.

This was handled by the else condition previously so I'll remove it.

Adds two line breaks between functions to fix lint issue

pmeier

In general, it seems that we always convert to numpy first and then import it. I think it would be much clearer if we did something like this:

if isinstance(pic, PIL_Image):
    pic = pil_to_numpy(pic)
elif isinstance(pic, accimage_Image):
    pic = accimage_to_numpy(pic)

return numpy_to_torchvision(pic)

torchvision/transforms/functional.py

xksteven

In general, it seems that we always convert to numpy first and then import it. I think it would be much clearer if we did something like this:
if isinstance(pic, PIL_Image):
    pic = pil_to_numpy(pic)
elif isinstance(pic, accimage_Image):
    pic = accimage_to_numpy(pic)

return numpy_to_torchvision(pic)

Are you suggesting to have a helper function for pil_to_numpy(pic) instead of including it in the same function as_tensor(pic) ?

Should they be callable or should it be _pil_to_numpy(pic) so as not to expose it to public API? I think I'd prefer the latter personally.

torchvision/transforms/functional.py

pmeier · 2020-04-10T22:38:56Z

Are you suggesting to have a helper function for pil_to_numpy(pic) instead of including it in the same function as_tensor(pic)?

No strong opinion here. My point is that as is (and as it was before) the function is quite confusing. First, we handle numpy images by importing them and hit a return. Afterwards accimage images are converted to numpy and then imported and returned. Finally, we do the same for PIL images.

I think it would be much clearer to convert accimage and PIL images to numpy first without hitting a return and finally to the import from numpy in one block.

Should they be callable or should it be _pil_to_numpy(pic) so as not to expose it to public API? I think I'd prefer the latter personally.

I suggest you do them nested within as_tensor as they are probably not useful to any other function. If you put them on the outside, I would prefer them not public.

fmassa

Thanks for the PR!

I did a first pass, let me know what you think

torchvision/transforms/functional.py

fmassa · 2020-04-15T15:59:09Z

torchvision/transforms/functional.py

+
+    img = img.view(pic.size[1], pic.size[0], len(pic.getbands()))
+    # put it from HWC to CHW format
+    img = img.permute((2, 0, 1)).contiguous()


I'm unsure if we want to call contiguous() here.
If fact, I was thinking about letting the tensor be with a different memory format (channels_last, HWC).

Could you explain your rationale for the HWC memory format?
Almost all of the downstream operations expect the CHW format so should there be a separate function that handles this?

I agree with @xksteven here. The only reason I see for changing the format is if someone just wants the import and want to squeeze every milli- / microsecond he can get. If that is the intention, I suggest we add a channels_first flag that defaults to True.

I miscommunicated what my intentions were, sorry about that.

What I wanted to say was that images are naturally stored as HWC, while all PyTorch operations expect CHW (up to now). But there is an ongoing effort on PyTorch to add support for channels_last, which takes tensors of shape CHW but with strides such that is just a transposed HWC (no contiguous call).

Given that all downstream operations in torchvision should support arbitrarily-strided tensors, I would vote for returning non-contiguous tensors, so that PyTorch, in the future when dedicated kernel support for channels_last is implemented) we will be able to handle those in an efficient manner.

fmassa · 2020-04-15T16:02:58Z

torchvision/transforms/functional.py

+    if isinstance(pic, np.ndarray):
+        # handle numpy array
+        if pic.ndim == 2:
+            pic = pic[:, :, None]
+
+        img = torch.from_numpy(pic.transpose((2, 0, 1)))
+        return img


I'm wondering if we should even support handling np.ndarray in this function.

Indeed, the data in a np.ndarray can have any format (for example, it can be a float array with range from 0-255), and we can't properly handle all possible cases. It's the responsibility of the user to do it. Plus, if the user passes OpenCV arrays to the function, it will be in BGR format (different from RGB from Pillow and what we use in torchvision)

As such, I think that we should probably only handle PIL Images -- handling numpy arrays is trivial from the user perspective (torch.as_tensor(ndarray))

I'm not really for or against keeping the numpy conversion here.

With that said I think the primary purpose of this function is doing the conversion to a pytorch tensor format and making it into a channels first format. The scope of the inputs can therefore be broadened or narrowed without really affecting the goals of the function.

The OpenCV arrays will still come out to be channels first after being passed through this function. We do not need to make any other assumptions other than the data format is HWC (or in the case of black&white images that it is HW).

So let me know if you think it's best to drop numpy.

I think what we should aim for is the least amount of potential user-errors or surprises.

Indeed, both OpenCV and scikit-image returns images as ndarrays of HWC format. But the color convention is not the same, and from our perspective there is no way to know if the array is indeed HWC or not (imagine multi-band images for example).
What scaries me is that the ndarray that is passed could also be CHW for some reason, and the function would just return something wrong.

For that reason, we could try to make the scope of this function to be as narrow as possible, so that we can be sure we won't be mishandling user inputs.
PIL Images and AccImage have a well-defined format representation (although I'm not sure that many people use AccImage actually), which is not the case for ndarrays which are generic data containers.

Let me know what you think

I am okay with the narrowed scope and have made changes to reflect that.

As a separate point I would like to keep the functionality of ToPILImage to accept numpy arrays. That way users can still load in numpy (or other formats that they can convert to numpy) and as long as the user can convert the numpy array to PIL Image the sequence of compose will still work (as written below).

transforms.Compose([ transforms.ToPILImage(), ... transforms.PILToTensor(), ])

Sounds good, changing ToPILImage was not in the plans (it would be a backwards-incompatible change)

fmassa · 2020-04-15T16:04:09Z

torchvision/transforms/functional.py

+def as_tensor(pic):
+    """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor of same type.
+
+    See ``AsTensor`` for more details.
+
+    Args:
+        pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
+
+    Returns:
+        Tensor: Converted image.
+    """


If we indeed only consider that this function only supports the PIL -> tensor conversion, then maybe a better name would be pil_to_tensor or something like that? Open to suggestions

Can you take a second pass at your earliest convenience?

One of the tests is a little awkward in that ToPILImage converts FloatTensors to bytes.
The other thing was I'm unsure of the parameter name "swap_to_channelsfirst".

Let me know.

Removes the extra if conditionals and replaces from_numpy with as_tensor.

Renames the function as_tensor to pil_to_tensor and narrows the scope of the function. At the same time also creates a flag that defaults to True for swapping to the channels first format.

Renames the function AsTensor to PILToImage and modifies the description. Adds the swap_to_channelsfirst boolean variable to indicate if the user wishes to change the shape of the input.

Add the __init__ function to PILToTensor since it contains the swap_to_channelsfirst parameter now.

remove trailing white space

Reflects the name change to PILToTensor and the parameter to the function as well as the new narrowed scope that the function only accepts PIL images.

Instead of undoing the transpose just create a new tensor and test that one.

Add img.view(pic.size[1], pic.size[0], len(pic.getbands())) back to outside the if condition.

fix conversion from torch tensor to PIL back to torch tensor.

remove trailing white space

Torch tranpose operates differently than numpy transpose. Changed operation to permute.

Add mode information when converting to PIL Image from Float Tensor.

codecov-io · 2020-04-17T02:35:55Z

Codecov Report

Merging #2092 into master will decrease coverage by 0.00%.
The diff coverage is 0.00%.

@@            Coverage Diff            @@
##           master   #2092      +/-   ##
=========================================
- Coverage    0.48%   0.48%   -0.01%     
=========================================
  Files          92      92              
  Lines        7409    7428      +19     
  Branches     1128    1131       +3     
=========================================
  Hits           36      36              
- Misses       7360    7379      +19     
  Partials       13      13

Impacted Files	Coverage Δ
torchvision/transforms/functional.py	`0.00% <0.00%> (ø)`
torchvision/transforms/transforms.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dec8628...2cb7a4f. Read the comment docs.

lukasHoel · 2020-05-08T19:04:00Z

torchvision/transforms/functional.py

+        return torch.as_tensor(nppic)
+
+    # handle PIL Image
+    img = torch.as_tensor(np.asarray(pic))


This line will still produce the same bug mentioned in #2194. Converting to numpy with np.asarray(pic) will keep the PIL image non-writeable. If instead we use np.array(pic), the bug #2194 would not appear. But I believe this is no real fix because np.array(pic) copies the data which might be unintended behavior here.

I think we will wait until PyTorch fixes this behavior in master, as making a copy would be fairly expensive.

If the warnings are too annoying, an alternative would be to only use asarray if the array is writeable, and use array if it is non-writeable

fmassa

Very sorry for the delay in reviewing this again.

I think the PR is in a very good shape, thanks a lot @xksteven !

I thought a bit more about the swap_to_channelsfirst, and I think it's better to always swap to CHW. The reason being that all operations in PyTorch expects CHW sizes for images, even if the memory format (due to strides) could be in HWC. So I think that we should follow this, and once memory_format support is more widespread, we can set this flag to the function.

Once this flag is removed, this PR is ready to be merged.

fmassa · 2020-05-15T12:42:27Z

torchvision/transforms/functional.py

+
+    Args:
+        pic (PIL Image): Image to be converted to tensor.
+        swap_to_channelsfirst (bool): Boolean indicator to convert to CHW format.


after a second thought, let's remove this flag and always perform the transpose to CHW format.

Makes the channel swapping the default behavior.

Remove the swap_channelsfirst argument and makes the swapping the default functionality.

fmassa

Thanks a lot @xksteven !

xksteven added 5 commits April 10, 2020 10:16

Adds as_tensor to functional.py

286e316

Similar functionality to to_tensor without the default conversion to float and division by 255. Also adds support for Image mode 'L'.

Adds tests to AsTensor()

f7eb489

Adds tests to AsTensor and removes the conversion to float and division by 255.

Adds AsTensor to transforms.py

f90b3bc

Calls the as_tensor function in functionals and adds the function AsTensor as callable from transforms.

Removes the pic.mode == 'L'

9c2fd3b

This was handled by the else condition previously so I'll remove it.

Fix Lint issue

08ab5ec

Adds two line breaks between functions to fix lint issue

pmeier reviewed Apr 10, 2020

View reviewed changes

torchvision/transforms/functional.py Show resolved Hide resolved

xksteven commented Apr 10, 2020

View reviewed changes

torchvision/transforms/functional.py Outdated Show resolved Hide resolved

fmassa reviewed Apr 15, 2020

View reviewed changes

xksteven added 14 commits April 15, 2020 22:49

Replace from_numpy with as_tensor

cb19ed4

Removes the extra if conditionals and replaces from_numpy with as_tensor.

Renames as_tensor to pil_to_tensor

38ad5f3

Renames the function as_tensor to pil_to_tensor and narrows the scope of the function. At the same time also creates a flag that defaults to True for swapping to the channels first format.

Renames AsTensor to PILToImage

1fa91a8

Renames the function AsTensor to PILToImage and modifies the description. Adds the swap_to_channelsfirst boolean variable to indicate if the user wishes to change the shape of the input.

Add the __init__ function to PILToTensor

0fefbcb

Add the __init__ function to PILToTensor since it contains the swap_to_channelsfirst parameter now.

fix lint issue

7662b23

remove trailing white space

Fix the tests

75be7bb

Reflects the name change to PILToTensor and the parameter to the function as well as the new narrowed scope that the function only accepts PIL images.

fix tests

123503a

Instead of undoing the transpose just create a new tensor and test that one.

Add the view back

eff1db0

Add img.view(pic.size[1], pic.size[0], len(pic.getbands())) back to outside the if condition.

fix test

266860a

fix conversion from torch tensor to PIL back to torch tensor.

fix lint issues

610fc1e

fix lint

b9cca77

remove trailing white space

Fixed the channel swapping tensor test

1b10f77

Torch tranpose operates differently than numpy transpose. Changed operation to permute.

Add mode='F'

fbf661c

Add mode information when converting to PIL Image from Float Tensor.

Added inline comments to follow shape changes

598107f

xksteven changed the title ~~Add as_tensor to functionals~~ Add pil_to_tensor to functionals Apr 17, 2020

ToPILImage converts FloatTensors to uint8

d69048e

pmeier mentioned this pull request May 8, 2020

Float PILImage not converted as writeable #2194

Open

lukasHoel reviewed May 8, 2020

View reviewed changes

fmassa reviewed May 15, 2020

View reviewed changes

Remove testing not swapping

fa1084c

xksteven added 2 commits May 15, 2020 15:53

Removes the swap_channelsfirst parameter

2cb7a4f

Makes the channel swapping the default behavior.

Remove the swap_channelsfirst argument

3d565fd

Remove the swap_channelsfirst argument and makes the swapping the default functionality.

fmassa approved these changes May 18, 2020

View reviewed changes

fmassa merged commit e6d3f8c into pytorch:master May 18, 2020

vfdev-5 mentioned this pull request Jul 29, 2021

pil_to_tensor returns non-contiguous tensor #4199

Closed

Add pil_to_tensor to functionals #2092

Add pil_to_tensor to functionals #2092

Uh oh!

Conversation

xksteven commented Apr 10, 2020

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xksteven left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier commented Apr 10, 2020

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa May 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-io commented Apr 17, 2020 •

edited

Loading

fmassa May 15, 2020 •

edited

Loading