Enhancing Object Detection #340

oke-aditya · 2020-11-06T03:09:10Z

🚀 Feature

As discussed. The following can be added.

F1. Support for backbones with FPNs. There is torchvision resnet_fpn_support which we can use directly.
F2. Support for Non FPN backbones. I have a hacky way to do this which won't support trainable_backbone_layers_parameter

Motivation

Orignally FRCNN Did not have FPN. It used VGG Backbone. So probably for legacy purposes we can keep it.
Currently it is hardcoded to use resnet_50. But with F1 it would be possible to use resnet101, resnext, etc.

Pitch

There are 2 ways to support this, I feel both are equally good it's a matter of design choice.

Pitch 1 (A simpler interface to user)

Just add an extra init parameter to backbone, we handle things internally.

class lit_frcnn(pl.LightningModule):
    """
    Creates a Faster CNN which can be fine-tuned.
    """

    def __init__(self, learning_rate: float = 0.0001, num_classes: int = 91,
                 pretrained: bool = False, backbone: str = None, fpn: bool = True,
                 pretrained_backbone: bool = True, trainable_backbone_layers: int = 3,
                 replace_head: bool = True, **kwargs, ):
        """
        Args:
            learning_rate: the learning rate
            num_classes: number of detection classes (including background)
            pretrained: if true, returns a model pre-trained on COCO train2017
            pretrained_backbone: if true, returns a model with backbone pre-trained on Imagenet
            trainable_backbone_layers: number of trainable resnet layers starting from final block
        """
        super().__init__()
        self.learning_rate = learning_rate
        self.num_classes = num_classes
        self.backbone = backbone
        if backbone is None:
            self.model = fasterrcnn_resnet50_fpn(pretrained=pretrained,
                                                 pretrained_backbone=pretrained_backbone,
                                                 trainable_backbone_layers=trainable_backbone_layers,)

            in_features = self.model.roi_heads.box_predictor.cls_score.in_features
            self.model.roi_heads.box_predictor = FastRCNNPredictor(in_features, self.num_classes)

        else:
            backbone_model = create_fastercnn_backbone(self.backbone, fpn, pretrained_backbone,
                                                       trainable_backbone_layers, **kwargs)
            self.model = FasterRCNN(backbone_model, num_classes=num_classes, **kwargs)

Where we define create_fastercnn_backbone function. This involves handling both resnet_fpn_backbone and a very hacky way of non_fpns. We can support backbone as nn.Module too, this will involve additional if statement.

This is kind of parameter overloading might not be very wise, but does the job with simple APIs.

Pitch 2 (Little bit pythonic but slightly complex)

Use a staticmethod to create backbone and allow to pass Backbone as nn.Module
This involves 2 steps but some people consider this more organized.
API would be.

backbone = lit_frcnn.create_backbone("resnet50", fpn=True, pretrained_backbone=True, trainable_backbone_layers=3)
frcnn_model = lit_frcnn(learning_rate=0.0001, num_classes=91, backbone=backbone, ...)

This is tricky API. But I'm not sure what convention does lightning follow.

1 is simpler to user. We can re-use it, as it is a function which can be outside the model file.
2 is slightly involved, also couples this part of backbone with FRCNN, thus for newer models we would need this static method to be defined again. (E.g. for retinanet)

Alternatives

I think the other way is to start creating abstract classes and inheriting them for such stuff. But again this makes API very complex and hard to maintain. People would be bound to use them over and over.

IMHO

If we are decoupling all the models. 1 is far better than 2, we might not have to rewrite these backbones again.
ALternative 2 is easy on the eyes, but the caveats are found later, this starts OOPs into bolts which kind of starts creating abstractions and inconsistent interfaces.

Additional context

Hmm, I am considering to couple this with #286 . The Non backbone code can be used in transfer learning as well. (I have made it work)

cc @ananyahjha93 @Borda

The text was updated successfully, but these errors were encountered:

briankosw · 2020-11-16T12:13:30Z

I think option 1 is better as well and I think it's better aligned with the design principles of PyTorch Lightning (and I think it makes a lot of sense w.r.t. #286 as you mentioned). I would love to get involved in this effort. What's the status on this so far?

oke-aditya · 2020-11-16T13:39:05Z

Unsure if it would go to bolts. But this API is implemented in Quickvision !

oke-aditya added enhancement New feature or request help wanted Extra attention is needed labels Nov 6, 2020

oke-aditya mentioned this issue Nov 6, 2020

Adds CNN Trainer #286

Closed

4 tasks

oke-aditya mentioned this issue Nov 19, 2020

Adds Backbones to Faster RCNN #382

Closed

7 tasks

oke-aditya mentioned this issue Jan 9, 2021

Adds Backbones to FRCNN Take 2 #475

Merged

8 tasks

Borda closed this as completed in #475 Jan 18, 2021

Borda added this to the v0.3 milestone Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancing Object Detection #340

Enhancing Object Detection #340

oke-aditya commented Nov 6, 2020 •

edited

Loading

briankosw commented Nov 16, 2020

oke-aditya commented Nov 16, 2020

Enhancing Object Detection #340

Enhancing Object Detection #340

Comments

oke-aditya commented Nov 6, 2020 • edited Loading

🚀 Feature

Motivation

Pitch

Pitch 1 (A simpler interface to user)

Pitch 2 (Little bit pythonic but slightly complex)

Alternatives

Additional context

briankosw commented Nov 16, 2020

oke-aditya commented Nov 16, 2020

oke-aditya commented Nov 6, 2020 •

edited

Loading