Skip to content

transforms.ToTensor() for numpy float array in the range of [0.0, 255.0] #546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
InnovArul opened this issue Jul 13, 2018 · 6 comments
Open

Comments

@InnovArul
Copy link

I had come across a debugging scenario where the ToTensor() didn't convert the numpy float array in the range of [0.0, 255.0] to the range [0.0 to 1.0] due to following lines:
https://github.com/pytorch/vision/blob/master/torchvision/transforms/functional.py#L50-L53

Basically, this API assumes that all the float arrays will already be in range [0.0 to 1.0].
Do you think we have to change this behavior?

@fmassa
Copy link
Member

fmassa commented Jul 13, 2018

I think I'll be creating an Image class that can hold a few different types (PIL images, numpy arrays, tensors), and during the constructor it will know what types of data it expects, so that we can cover all those use-cases.

@InnovArul
Copy link
Author

InnovArul commented Jul 13, 2018

or create a universal API to directly read an image in a default format, given the filepath? I am not sure if this will help. I feel we (users) are using different packages to read (PIL, scipy, skimage, opencv etc) images. Hence we have more cases to cover.

@ekagra-ranjan
Copy link
Contributor

ekagra-ranjan commented Mar 10, 2019

@fmassa This issue really confused me as a beginner in pytorch :) What if we change line 57-62 with:

        img = torch.from_numpy(pic.transpose((2, 0, 1)))
        # backward compatibility
        
        if isinstance(img, torch.ByteTensor):
            return img.float().div(255)
        else:
            img = img.float()
            if torch.max(img)<=1.0:
                return img
            else:
                return img.div(255)
                

@fmassa
Copy link
Member

fmassa commented Mar 11, 2019

@ekagra-ranjan we could try doing something like that, but the cost of torch.max(img) is negligible and would slowdown many things that rely on ToTensor with floating point data.

I think the underlying issue is that ToTensor tries to do way too many things internally, and it can't suit all needs.

@Melika-Ayoughi
Copy link

I found the same bug while loading the moving mnist data. The input is not float but the same bug still exists that the ToTensor() function does not transform the values from 0 to 255 to 0 to 1, it also does not give an any warning, so I found it while debugging my own neural net.
Is there any clean solution for that?

@fmassa
Copy link
Member

fmassa commented Jul 9, 2019

@Melika-Ayoughi I'm not sure there is a clean solution if we keep the current approach.
We can do workarounds as the one @ekagra-ranjan mentioned, but this will not be a complete fix.

This is something that I believe should be completely redesigned in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants