-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds normalize parameter to ToTensor operation #2060
Conversation
Adds normalize parameter to totensor defaults to true for backwards compatibility.
Adds normalize parameter to ToTensor defaults to true for backwards compatibility.
Hi @xksteven, I disagree with your premise
The native image type of One could then ask why we use values from this interval and not Since If you want to use class FloatToUint8ValueRange:
def __call__(self, img):
return img * 255.0 and bundle it after from torchvision import transforms
transform = transforms.Compose([transforms.ToTensor(), FloatToUint8ValueRange()]) If we want to include this, I suggest that we at least change the name of the parameter, since |
Hello @pmeier, One particular use case I was dealing with is loading semantic segmentation maps which are conveniently stored as images and can be loaded nicely as labels with this transform except for the division by 255. I thought by making the division optional it will allow for the loading of these and other types of images that do not necessarily follow that the maximum value is that of 255. The parameter would allow for much easier convenience instead of forcing to add another transform. I agree that the use of the word normalize could be confusing. I thought that the word "standardize" would be worse. Do you have an idea of what it should be called? Thanks for your comments! |
I'm not sure if I understood you right here, but I fail to see why the segmentation maps have to be in the interval
From this PR I think you load the images with
If I had to, and I'm not in favor of that, I would go with something like |
The semantic segmentation maps are themselves the labels per pixel. So each pixel value represents a class encoded by the integer value 0,1,2 ..., 255. This can be loaded and then passed directly into the loss function (the values might need to be converted into type long first in some cases).
Honestly I agree. I think this is a very small pull request, and it introduces minimal changes that I personally would like but are perhaps suboptimal. I can't imagine it being a bottleneck I just thought it would be better to make what the ToTensor operation does slightly more transparent. It felt weird to me that this one function was doing several functions together. Not only was it converting the images into a pytorch Tensor but it was also normalizing them in the [0.0, 1.0] range. The main reason it felt weird because the name ToTensor doesn't seem to imply the normalization functionality you have to look at the documentation for that part to become apparent. I thought it might be useful to keep the first functionality of converting images to Tensors and make the second functionality (the normalization aspect) optional. If the changes are too small and inconvenient I'm open to just closing the pull request. (I accidentally clicked close earlier). |
Codecov Report
@@ Coverage Diff @@
## master #2060 +/- ##
=========================================
- Coverage 0.48% 0.48% -0.01%
=========================================
Files 92 92
Lines 7411 7415 +4
Branches 1128 1130 +2
=========================================
Hits 36 36
- Misses 7362 7366 +4
Partials 13 13
Continue to review full report at Codecov.
|
If you need to convert them to
I think right now we aren't communicating the native image type of
Don't give up yet: although I'm in favor of closing this PR, I have no say here. We'll have to wait for a member of PyTorch to drop in and settle this. |
Maybe there could be an additional parameter to resulting output tensor? Alternatively the ToTensor operation could not change the default input types from int, byte, double automatically into float. I'll see if I can think of other approaches. If one does
Information about the mode has been lost in the process and it does not return the same PIL Image that one started with. I believe it may result in an error in some cases (although I'd have to do more tests to confirm it's true about throwing an error). |
Hi, Thanks for the PR! I agree with @pmeier comments, and this is something that has been come up a few times already. The underlying issue is that I think we should re-design this abstraction altogether, so that the contract is clear from the beginning. I propose to split
This way, the semantics of the functions are clear, and there are no surprises for the users. Also, when the image reading PRs are merged in torchvision, the Thoughts? |
|
I like this approach.
I think the convert_image_dtype should not also do the normalization. It should only do the potential scaling from max of one dtype to another and as tensorflow does they add a parameter to called saturate if the user wishes to avoid overflow or underflow problems. It occurs such as when converting from max float to max int for example. I can work on adding the as_tensor code in a separate pull request and can close this pull request whenever. I don't think this PR fully addresses the core issues anyways. I think the convert_image_dtype will be a bit more involved coding wise but I'd be happy to take a look after I or someone else finishes the first one. I think we'd need a few helper functions to deal with the saturating cast and I'm not sure where to best put them. |
If If you just want to change the |
You're right they do the scaling. From their documentation it was not apparent to me that they were doing the scaling because MAX was described as largest positive representable number
I'll withdraw my criticism as it appears everyone else is doing the conversion and normalization, and the "as_tensor" function would handle my own use cases as well. |
@pmeier correct about your first point. One thing to consider is if we want to do the transposition from HWC to CHW in About your second point:
Partially, as the user can specify if he wants to convert to a float type (the default) or a different type. I would be happy to review PRs addressing those two points. And for |
I'll close the pull request and begin working on the new function :) |
It should be the case that the divide by 255 is an optional step that is taken when converting to a pytorch tensor. The parameter should default to True however to preserve backwards compatibility.