Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re usable Components and CNN Trainer #203

Closed
oke-aditya opened this issue Sep 11, 2020 · 10 comments
Closed

Re usable Components and CNN Trainer #203

oke-aditya opened this issue Sep 11, 2020 · 10 comments
Assignees
Labels
discussion enhancement New feature or request help wanted Extra attention is needed won't fix This will not be worked on
Milestone

Comments

@oke-aditya
Copy link
Contributor

oke-aditya commented Sep 11, 2020

Hey all, I'm new to Bolts and a bit to lightning, but wokred with PyTorch. This project is super awesome, completely in snyc with my thoughts.

🚀 Feature

Torchivision supports lot of pre-trained CNN models. One of the common tasks in fine tuning these CNNs. (Transfer learning) This issue was open with lightning team as well I guess. Can we create a trainer for this purpose that allows easy fine tuning or even raw training just by using pretrained = False

The only layer that differs in the last Linear layer which being nn.Linear(hidden_dim, num_classes)

Motivation

Such pre-trained CNNs are used as Backbones for Object detection tasks as well. E.g. you can fine tune a CNN on say animals such as tiger, lion and then use these CNN weights as backbone for F-RCNN .

This is a key feature, lightning supports multi GPU training and TPU training out of the box for all such operations.

Pitch

One of my earlier attempts to do the same with pytorch was here. I could successfully tune and train all the PyTorch models and almost all PyTorch Image Models. It had simple API which syncs with #200.

model = model_factory.create_torchvision_model("resnet18", num_classes=10, pretrained=True,)

All these fine-tunable backbones are compatible with F-RCNN also, they can simply be passed to torchvision FRCNN.

Alternatives

I'm unsure if my backbone extraction for CNN is efficient and good. It assumes we have nn.AdaptiveAvgPool2d() followed by Linear only in classification tasks, hence simply extracts all layers before this linear. I held to this assumption since all torchvision and PyTorch Image Models follow this. But I guess we need to assume something.

Additional Context

What this feature allows is simple fine tuner to CNNs directly supported with all lightning perks and bonuses.
A common backbone API, re-usable to all vision models such as FRCNN, Mask-RCNN, RetinaNet (in next torchvision release).
I'm unsure still if this can be extensible and maintainable.

I can probably work on this feature, I would need a bit help as I'm new to this repo and lightning.

@oke-aditya oke-aditya added enhancement New feature or request help wanted Extra attention is needed labels Sep 11, 2020
@github-actions
Copy link

Hi! thanks for your contribution!, great first issue!

@nateraw
Copy link
Contributor

nateraw commented Sep 11, 2020

Yes! I believe this is exactly where we are tryin to go w/ things...

Model architectures that are dead simple to use and include options to:

  • load pretrained weights we provide and use them for inference or finetuneing on your task
  • load random weights + train yourself
  • subclass + do whatever you want

This is not just limited to torchvision CNNs, but more general, I think. Torchvision stuff is a great place to start though. If you want to give implementing this a try, I'd love to see a PR from you 😄 .


Feel free to reach out to me on the slack channel and/or make a post on the forum to discuss further.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Sep 11, 2020

I will open PR soon once the proposal is fine.

Transfer learning involves fine-tuning + model unfreezing and a lot of other tweaks.

This issue is open with lightning core team here.

Bolts primarily use fine-tuning models. E.g. Faster RCNN.

For CNN Fine-Tuning, we can provide something similar fine-tunable heads of Linear layers.
This should be compatible with all torchvision models and should support most models if they follow a certain model structure.

The proposed code is here in Pytorch, tested and worked for almost all Torchvision models.

The assumption this code makes is we have a single linear layer on top of the backbone or base classifier.

These backbones are quite re-usable and can again be used to plug into detection models, I tested that before.

Proposed API

Have a separate folder for backbones, it is getting repetitive and can be reused for other models.

models
-- components
----- __init__.py
----- torchvision_backbones.py
----- any_other_components.py

-- classification
---- __init__.py
----  cnn.py

I could see a lot places we are using components.py file. (in upcoming PRs too)
I feel this is very generic and can be re-used irrespective of models which we are trying. E.g. these CNN feature extractor component can be re-used in say image captioning tasks as well as detection tasks. Encoder-Decoder Resnet blocks can be re-used in VAEs, CVAEs, and some other places too.
fpn (feature pyramid networks) can be used in both detection and segmentation.

Currently, we do code duplication for some simple blocks such as Conv3x3, etc. Components really specific to a particular model can stay with the model. But something like MLP, resnet_encoder these blocks are far too generic and we can offer these as API to end-users too.

If we could some-how unify re-usable components to a components folder and some components that are very specific to a particular application or model with model itself.

cnn.py file would be very similar to detection faster_rcnn.py

class CNN(pl.LightningModule):
def __init__(self, model_name: str, num_classes: int, pretrained: bool = True):
        super().__init__()
        self.bottom, self.out_channels = _create_torchvision_backbone(model_name, num_classes, pretrained=pretrained)
        self.top = nn.Linear(self.out_channels, num_classes)

    def forward(self, x):
        x = self.bottom(x)
        x = self.top(x.view(-1, self.out_channels))
        return x

And subsequent simple train_step, val_step etc follow

Let me know how freezing and unfreezing can be used!

Just a gist, the idea is to support torchvision models out of the box, and if a user wants his custom CNN to be fine-tuned, he can simply define his own model still use our lightning trainer. This was one of my ideas in old PyPi package here

I'm new here and I guess this PR has a stuff that need to be maintained in future, hence would need thoughts from people. (I'm unsure and don't want a breaking code). This introduces something new and leads to breaking refactor as well.

cc @ananyahjha93 @Borda @nateraw @williamFalcon

@oke-aditya oke-aditya changed the title Re usable Backbones and CNN Trainer Re usable Compenents and CNN Trainer Sep 12, 2020
@oke-aditya oke-aditya changed the title Re usable Compenents and CNN Trainer Re usable Components and CNN Trainer Sep 13, 2020
@Borda
Copy link
Member

Borda commented Sep 14, 2020

@teddykoker @justusschock this might be interesting for you...

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 17, 2020

Hello guys, sorry for the extreme delay. I have started the implementation into this.
For the initial prototype. I'm trying to get it working in the above repository.

I will raise a PR soon on bolts once I get it working in the above repo.
(I have to do some prototyping and improve my code)

@oke-aditya oke-aditya mentioned this issue Oct 18, 2020
4 tasks
@justusschock
Copy link
Member

@oke-aditya please slow down a bit :D
we're not yet sure, how/whether this shall be included to bolts at all, so we don't want anybody to put some effort to this, when this may be discarded later on.

@oke-aditya
Copy link
Contributor Author

oke-aditya commented Oct 19, 2020

😄 No issues, I really like implementing models.
@justusschock 👍 Let's keep this PR on standby.

@Borda Borda added the let's do it! Looking forward to have it implemented label Oct 21, 2020
@Borda Borda added discussion and removed model let's do it! Looking forward to have it implemented labels Nov 6, 2020
@Borda
Copy link
Member

Borda commented Nov 6, 2020

@teddykoker mind have a look... 🐰

@stale
Copy link

stale bot commented Jan 5, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the won't fix This will not be worked on label Jan 5, 2021
@stale stale bot closed this as completed Jan 14, 2021
@Borda Borda added this to the v0.3 milestone Jan 18, 2021
@gau-nernst
Copy link

Hello, are there any updates on this? I'm re-implementing some research backbones (e.g. VoVNet) and want to train them on ImageNet to reproduce the results, and potentially use them as detection backbones for my other projects (with ImageNet pretraining).

I was looking at Bolts to see if there are anything that suit my use case, but there doesn't seem to be any? Of course implementing the training logic by myself is straight-forward, but I wonder if this use case can be covered by Bolts? Like researchers create new backbones and want to quickly get its ImageNet results, obtain ImageNet pretraining weights, and use the backbone for other vision tasks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request help wanted Extra attention is needed won't fix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants