[New Models: adding PyTorch TorchVision's MaskRCNN_ResNet50_FPN_V2, FasterRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2] 

### Models description

Hello.

there is new intresting version of **Masked RCNN** model in TorchVision ([link](https://pytorch.org/vision/stable/models.html#instance-segmentation)).
**maskrcnn_resnet50_fpn_v2** - Improved Mask R-CNN model with a ResNet-50-FPN backbone from the [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/abs/2111.11429) paper.

maskrcnn_resnet50_fpn_v2 model gives effective increase([link](https://pytorch.org/vision/stable/models.html#instance-segmentation)) for MS COCO metric in comparision with classic **maskrcnn_resnet50_fpn**.

![image](https://user-images.githubusercontent.com/4876436/213477454-f7433bcd-a517-436b-be6e-3ee2ac7404a0.png)

I see some examples of fine tuning. The code for fine tuning **maskrcnn_resnet50_fpn_v2** and **maskrcnn_resnet50_fpn** are identical.
MMDetection framework has support for classic TorchVision's **maskrcnn_resnet50_fpn** fine tuning. It will be great if MMDetection framework also has support for new TorchVision's **maskrcnn_resnet50_fpn_v2**.

**Describe the solution you'd like**
It will be great if MMDetection framework also has support for new TorchVision's **maskrcnn_resnet50_fpn_v2**. Also there is an updated version of the these detectors, - **FasterRCNN_ResNet50_FPN_V2** and **RetinaNet_ResNet50_FPN_V2**.

![image](https://user-images.githubusercontent.com/4876436/213479643-5adbf44d-3855-45f5-b948-30ab884341d8.png)

**P.S.**
Currently, we already have many excellent neural networks for detection in the MMDetection framework. But it is important that **Faster and Masked RCN** are multi-stage detectors. Most of the more accurate semi real-time detectors are single-stage.

In one competition, I used **YOLOv7**, which had a higher metric on MS COCO for detection (53). But the competitors that used the classic multistage **Faster R-CNN** won that gives only 37. It turned out that on a dataset with crowded objects, **Faster RCNN** works better than a single-stage **YOLOv7**, even though there is a big difference in metrics on MS COCO in the YOLOv7 slider.


### Open source status

- [X] The model implementation is available
- [X] The model weights are available.

### Provide useful links for the implementation

Improved Mask R-CNN v2 model with a ResNet-50-FPN backbone describes in the [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/abs/2111.11429) paper.
We have implementation of this model in PyTorch TorchVision ([link](https://pytorch.org/vision/stable/models.html#instance-segmentation)).
There are [MaskRCNN_ResNet50_FPN_V2_Weights.COCO_V1] in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.maskrcnn_resnet50_fpn_v2.html#torchvision.models.detection.MaskRCNN_ResNet50_FPN_V2_Weights).
There is link to merge request (https://github.com/pytorch/vision/pull/5773).
It seems that @datumbox is the author of the code.

Constructs an improved Faster R-CNN v2 model with a ResNet-50-FPN backbone from [Benchmarking Detection Transfer Learning with Vision Transformers](https://arxiv.org/abs/2111.11429) paper.
We have implementation of this model in PyTorch TorchVision ([link](https://pytorch.org/vision/stable/models.html#object-detection)).
There are [FasterRCNN_ResNet50_FPN_V2_Weights.COCO_V1]  in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.fasterrcnn_resnet50_fpn_v2.html#torchvision.models.detection.FasterRCNN_ResNet50_FPN_V2_Weights).
There is link to merge request (https://github.com/pytorch/vision/pull/5763).
It seems that @datumbox is the author of the code.

There is no such information about RetinaNet_ResNet50_FPN_V2, but I think that TorchVision's developers create it by the same principle.
We have implementation of this model in PyTorch TorchVision ([link](https://pytorch.org/vision/stable/models.html#object-detection)).
There are [RetinaNet_ResNet50_FPN_V2_Weights.COCO_V1] in torchvision too (https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.retinanet_resnet50_fpn_v2.html#torchvision.models.detection.RetinaNet_ResNet50_FPN_V2_Weights)
There is link to merge request (https://github.com/pytorch/vision/pull/5756).
It seems that @datumbox is the author of the code.

As I understand on the same principle @datumbox created FasterRCNN_ResNet50_FPN_V2, MaskRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2. 
Perhaps you can improve the rest of the backbones that are available for these architectures in MMDetection =)
That would be just super )



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[New Models: adding PyTorch TorchVision's MaskRCNN_ResNet50_FPN_V2, FasterRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2] #9653

Models description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[New Models: adding PyTorch TorchVision's MaskRCNN_ResNet50_FPN_V2, FasterRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2] #9653

Description

Models description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions