-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add U-Net model #899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add U-Net model #899
Conversation
Thanks for the PR!
We are currently focusing on COCO / Pascal for semantic segmentation tasks. I'll be uploading pre-trained weights for those models soon, and they have been trained with the
My thoughts on this is that we should create a folder |
Also, at least for Pascal / COCO, we generally allow the model to take arbitrary backbones, so that one can switch from resnet to resnext for example, and reuse pre-trained weights. It seems that your implementation here doesn't follow this pattern? |
That's fine by me, no rush from my end.
I agree, I'm fine with waiting until you finish your detection PR. I think there should be a folder for classification, object_detection, semantic_segmentation, and instance_segmentation.
I'm not sure if U-Nets fit that description. ResNets usually condense an image down to a single pixel (with multiple channels), followed by fully connected layers. U-Nets condense an image to a smaller 32x32 grid, then upsample the image to its original resolution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with Unet, however, seems people prefer changing the architecture by using Conv2d with stride=2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could answer my query about out_channels
arg.
"""`U-Net <https://arxiv.org/pdf/1505.04597.pdf>`_ architecture. | ||
|
||
Args: | ||
in_channels (int, optional): number of channels in input image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I chose in_channels=1
as the default for U-Net is because this is how the original U-Net paper is modeled, using a single channel grayscale microscope imagery dataset (see #900). The application I needed it for was actually 4-channel microscope imagery, but unfortunately PIL doesn't support this (see #882). If we decide to pretrain this on COCO/Pascal I'm fine with switching the default to 3-channel.
It's pretty common for the encoder half of the network to be based on a standard backbone like ResNet or VGG. If I recall, the original U-Net paper was pretty a much a VGG net on the encoder side already. For the decoder, many find that using upsampling provides better results than transpose convolutions. I've seen a few impl that allow choosing either when the model is constructed. |
Further my previous comment, some PyTorch U-Nets with support for different backbones and the mentioned upsampling decoder blocks.
It'd be nice to have a normalization layer option, ideally flexible like TernausNet. There is a worthwile reference in both of Vladimir's TernausNet impl about the transpose conv and the potential artifacts resulting from it when implemented as in the original paper. |
@fmassa What's the status of this PR? Is this something that is still wanted, or should I close it? |
It seems like this PR has been abandoned by upstream, so I'm going to close it. Feel free to reopen or steal these commits to make a new PR. |
This PR adds the U-Net model to
torchvision
. U-Net is very popular in image segmentation, especially in the biomedical imaging space.Questions:
segmentation.py
? This seems cumbersome. Any suggestions for how to improve this?