-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Description
Models description
Hello.
there is new intresting version of Masked RCNN model in TorchVision (link).
maskrcnn_resnet50_fpn_v2 - Improved Mask R-CNN model with a ResNet-50-FPN backbone from the Benchmarking Detection Transfer Learning with Vision Transformers paper.
maskrcnn_resnet50_fpn_v2 model gives effective increase(link) for MS COCO metric in comparision with classic maskrcnn_resnet50_fpn.
I see some examples of fine tuning. The code for fine tuning maskrcnn_resnet50_fpn_v2 and maskrcnn_resnet50_fpn are identical.
MMDetection framework has support for classic TorchVision's maskrcnn_resnet50_fpn fine tuning. It will be great if MMDetection framework also has support for new TorchVision's maskrcnn_resnet50_fpn_v2.
Describe the solution you'd like
It will be great if MMDetection framework also has support for new TorchVision's maskrcnn_resnet50_fpn_v2. Also there is an updated version of the these detectors, - FasterRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2.
P.S.
Currently, we already have many excellent neural networks for detection in the MMDetection framework. But it is important that Faster and Masked RCN are multi-stage detectors. Most of the more accurate semi real-time detectors are single-stage.
In one competition, I used YOLOv7, which had a higher metric on MS COCO for detection (53). But the competitors that used the classic multistage Faster R-CNN won that gives only 37. It turned out that on a dataset with crowded objects, Faster RCNN works better than a single-stage YOLOv7, even though there is a big difference in metrics on MS COCO in the YOLOv7 slider.
Open source status
- The model implementation is available
- The model weights are available.
Provide useful links for the implementation
Improved Mask R-CNN v2 model with a ResNet-50-FPN backbone describes in the Benchmarking Detection Transfer Learning with Vision Transformers paper.
We have implementation of this model in PyTorch TorchVision (link).
There are [MaskRCNN_ResNet50_FPN_V2_Weights.COCO_V1] in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.maskrcnn_resnet50_fpn_v2.html#torchvision.models.detection.MaskRCNN_ResNet50_FPN_V2_Weights).
There is link to merge request (pytorch/vision#5773).
It seems that @datumbox is the author of the code.
Constructs an improved Faster R-CNN v2 model with a ResNet-50-FPN backbone from Benchmarking Detection Transfer Learning with Vision Transformers paper.
We have implementation of this model in PyTorch TorchVision (link).
There are [FasterRCNN_ResNet50_FPN_V2_Weights.COCO_V1] in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn_v2.html#torchvision.models.detection.FasterRCNN_ResNet50_FPN_V2_Weights).
There is link to merge request (pytorch/vision#5763).
It seems that @datumbox is the author of the code.
There is no such information about RetinaNet_ResNet50_FPN_V2, but I think that TorchVision's developers create it by the same principle.
We have implementation of this model in PyTorch TorchVision (link).
There are [RetinaNet_ResNet50_FPN_V2_Weights.COCO_V1] in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.retinanet_resnet50_fpn_v2.html#torchvision.models.detection.RetinaNet_ResNet50_FPN_V2_Weights)
There is link to merge request (pytorch/vision#5756).
It seems that @datumbox is the author of the code.
As I understand on the same principle @datumbox created FasterRCNN_ResNet50_FPN_V2, MaskRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2.
Perhaps you can improve the rest of the backbones that are available for these architectures in MMDetection =)
That would be just super )

