Skip to content

Add U-Net model #899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

adamjstewart
Copy link
Contributor

@adamjstewart adamjstewart commented May 11, 2019

This PR adds the U-Net model to torchvision. U-Net is very popular in image segmentation, especially in the biomedical imaging space.

Questions:

  • What's the requirement for pretrained models? I'm not sure what the standard dataset for image segmentation is, or which hyperparameters would work best. I just tried to follow the original U-Net paper as best I could.
  • The unit tests seem to be failing. It looks like all segmentation models are supposed to be defined in segmentation.py? This seems cumbersome. Any suggestions for how to improve this?

@fmassa
Copy link
Member

fmassa commented May 14, 2019

Thanks for the PR!

What's the requirement for pretrained models? I'm not sure what the standard dataset for image segmentation is, or which hyperparameters would work best. I just tried to follow the original U-Net paper as best I could.

We are currently focusing on COCO / Pascal for semantic segmentation tasks. I'll be uploading pre-trained weights for those models soon, and they have been trained with the references/segmentation/train.py script available in torchvision
It is a requirement to have pre-trained weights, and they should match the reported accuracies within a few %.
I could try giving a shot and train the model, but I won't have time before July I think.

The unit tests seem to be failing. It looks like all segmentation models are supposed to be defined in segmentation.py? This seems cumbersome. Any suggestions for how to improve this?

My thoughts on this is that we should create a folder segmentation, and move all semantic segmentation models there. This will require a bit of refactoring, but I'm considering doing it anyway following the detection PR that I'm preparing.

@fmassa
Copy link
Member

fmassa commented May 14, 2019

Also, at least for Pascal / COCO, we generally allow the model to take arbitrary backbones, so that one can switch from resnet to resnext for example, and reuse pre-trained weights. It seems that your implementation here doesn't follow this pattern?

@adamjstewart
Copy link
Contributor Author

I could try giving a shot and train the model, but I won't have time before July I think.

That's fine by me, no rush from my end.

My thoughts on this is that we should create a folder segmentation, and move all semantic segmentation models there. This will require a bit of refactoring, but I'm considering doing it anyway following the detection PR that I'm preparing.

I agree, I'm fine with waiting until you finish your detection PR. I think there should be a folder for classification, object_detection, semantic_segmentation, and instance_segmentation.

Also, at least for Pascal / COCO, we generally allow the model to take arbitrary backbones, so that one can switch from resnet to resnext for example, and reuse pre-trained weights. It seems that your implementation here doesn't follow this pattern?

I'm not sure if U-Nets fit that description. ResNets usually condense an image down to a single pixel (with multiple channels), followed by fully connected layers. U-Nets condense an image to a smaller 32x32 grid, then upsample the image to its original resolution.

Copy link

@Zhaoyi-Yan Zhaoyi-Yan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with Unet, however, seems people prefer changing the architecture by using Conv2d with stride=2 .

Copy link
Contributor

@ekagra-ranjan ekagra-ranjan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could answer my query about out_channels arg.

"""`U-Net <https://arxiv.org/pdf/1505.04597.pdf>`_ architecture.

Args:
in_channels (int, optional): number of channels in input image
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason I chose in_channels=1 as the default for U-Net is because this is how the original U-Net paper is modeled, using a single channel grayscale microscope imagery dataset (see #900). The application I needed it for was actually 4-channel microscope imagery, but unfortunately PIL doesn't support this (see #882). If we decide to pretrain this on COCO/Pascal I'm fine with switching the default to 3-channel.

@rwightman
Copy link
Contributor

Also, at least for Pascal / COCO, we generally allow the model to take arbitrary backbones, so that one can switch from resnet to resnext for example, and reuse pre-trained weights. It seems that your implementation here doesn't follow this pattern?

I'm not sure if U-Nets fit that description. ResNets usually condense an image down to a single pixel (with multiple channels), followed by fully connected layers. U-Nets condense an image to a smaller 32x32 grid, then upsample the image to its original resolution.

It's pretty common for the encoder half of the network to be based on a standard backbone like ResNet or VGG. If I recall, the original U-Net paper was pretty a much a VGG net on the encoder side already.

For the decoder, many find that using upsampling provides better results than transpose convolutions. I've seen a few impl that allow choosing either when the model is constructed.

@rwightman
Copy link
Contributor

Further my previous comment, some PyTorch U-Nets with support for different backbones and the mentioned upsampling decoder blocks.

It'd be nice to have a normalization layer option, ideally flexible like TernausNet. There is a worthwile reference in both of Vladimir's TernausNet impl about the transpose conv and the potential artifacts resulting from it when implemented as in the original paper.

@adamjstewart
Copy link
Contributor Author

@fmassa What's the status of this PR? Is this something that is still wanted, or should I close it?

@adamjstewart
Copy link
Contributor Author

It seems like this PR has been abandoned by upstream, so I'm going to close it. Feel free to reopen or steal these commits to make a new PR.

@adamjstewart adamjstewart deleted the models/unet branch July 28, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants