-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Change draw_segmentation_masks to accept boolean masks #3820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
53cd516
WIP
NicolasHug 9cf6247
rm images
NicolasHug d7de81e
cleanup
NicolasHug c598c1f
Merge branch 'master' into draw_seg_mask
NicolasHug e52fb08
pep
NicolasHug 07b6147
Merge branch 'draw_seg_mask' of github.com:NicolasHug/vision into dra…
NicolasHug 131f1a4
temporarily remove mask example
NicolasHug 7876d47
Add comment about float images expected in 0-1
NicolasHug e0e0fc6
remove debug stuff
NicolasHug 5d287db
Put back None testing
NicolasHug f91d070
more robust test
NicolasHug File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have to support anything other than uint8 here, given that this method is used for visualization. To be specific, the only issue I have with this is that you are being forced to assume that the colour needs to be between 0-1, hence divide by 255. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too thought (I will have a thorough look tommorrow!). The other utility
draw_bounding_boxes
returnuint8
dtype only. So perhaps we can be consistent on either.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current interface takes a uint8 image as input and outputs a float image. This means that you can't callEDIT: this isn't the case, in master the function properly returns a uint8 image.draw_segmentation_masks
on that output again, which isn't super practical.Also, the models accept float images, not uint8 images. If you look at the current example https://pytorch.org/vision/master/auto_examples/plot_visualization_utils.html#sphx-glr-auto-examples-plot-visualization-utils-py we're forced to have 2 copies of each image: one float and one for uint8. This isn't optimal either.
Supporting both floats and uint8 and have the dtypes "pass through" solves both of these issues for little maintenance cost - I wrote tests that ensure both dtypes yield the same results.
Regarding consistency, I'm planning on modifying draw_bounding_boxes to have the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true the models receive float images that are also typically Normalized. These are not the versions of the images that you will use for visualization though. Most models scale to [0-1] while others recently require scaling to [-1,1] (see the SSD models). So at the end the user will need to keep around the non-normalized
uint8
version to visualize it.An alternative course of action might be to continue handling all types of images but avoid any assumption over the color scales. You could do that by creating a palette only if the image is uint8 and expect the user to provide a valid list of colors on their right scales otherwise. Happy to discuss this more to unblock your work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #3774 (comment), Mask-RCNN and Faster-RCNN don't require normalization, so one could use the same non-normalized float images in this case I think.
IIUC, the main concern seems to be the assumption that if an image is a float, then we expect it to be in [0, 1]. According to our docs, this assumption holds for the transforms already:
It also holds when calling
F.convert_image_dtype(img, torch.float)
and all our conversion utilities as far as I can tell, so it seems to be a reasonable assumption.I'll add a comment in the docstring to clarify that. The worst thing that can happen if the user passes a float image that's not in [0, 1] is that the colors will be slightly off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words: as far as I can tell, removing the assumption that float images are in [0, 1] for
draw_segmentation_masks
will not facilitate any use-case, and it won't make anything worse either.OTOH, removing this assumption forces users to keep the uint8 image around, or it forces them to manually pass colors. By seamlessly allowing float images in [0, 1] users only need to keep the float image in [0, 1], and possibly the normalized image to pass to a model. They can discard the uint8 image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this idea that float images are implicitly expected to be in 0-1 is largely adopted throughout the ecosystem. Matplotlib assumes this too: see below how both images are rendered exactly the same. By allowing this, we're making our tools easily usable by users that are used to the existing state of things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point here is about keeping this assumption contained and not exposing it to other places of the library. Whether we should change the behaviour of ToTensor to remove the silent rescaling is something we could discuss for TorchVision v1.0.
This is precisely why the original versions of these methods supported only uint8 images and that's why we can't just throw away the uint8 versions of the images. The Segmentation models rescale and normalize the images outside of the models meaning you would have to undo them before vizualizing.
At this point I think it would be best not to merge the PR without containing the Gallery examples that would give us a clear view of the usage and the how corner-cases are handled. Perhaps @fmassa can weight-in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users have to convert their images to float in [0, 1] at some point.
So it's not just something contained in the transforms. It's pervasive throughout the library, and throughout the entire ecosystem. By supporting those images directly, we allow users to drop the uint8 images and only keep the float images (normalized and non-normalized versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still WIP but see also https://github.com/pytorch/vision/pull/3824/files#r631748970 to illustrate why and how we can drop references to uint8 images once we support floats in 0-1