-
Notifications
You must be signed in to change notification settings - Fork 597
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release code of AAAI 2021 paper. "Temporal ROI Align for Video Object…
… Recognition" (#247) * release the code of Temporal RoI Align * add docstring for troi code * add unittest for troi code * change README.md, metafile.yml, model_zoo.md and model-index.yml * tiny changes of README.md * tiny changes of metafile.yml * tiny changes of README.md * update based 1-st comments * update based on 2-nd comments
- Loading branch information
Showing
16 changed files
with
427 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Temporal RoI Align for Video Object Recognition | ||
|
||
## Introduction | ||
|
||
[ALGORITHM] | ||
|
||
```latex | ||
@inproceedings{gong2021temporal, | ||
title={Temporal ROI Align for Video Object Recognition}, | ||
author={Gong, Tao and Chen, Kai and Wang, Xinjiang and Chu, Qi and Zhu, Feng and Lin, Dahua and Yu, Nenghai and Feng, Huamin}, | ||
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, | ||
volume={35}, | ||
number={2}, | ||
pages={1442--1450}, | ||
year={2021} | ||
} | ||
``` | ||
|
||
## Results and models on ImageNet VID dataset | ||
|
||
We observed that the performance of this method has a fluctuation of about 0.5 mAP. The checkpoint provided below is the best one from two experiments. | ||
|
||
Note that the numbers of selsa modules in this method and `SELSA` are 3 and 2 respectively. This is because another selsa modules improve this method by 0.2 points but degrade `SELSA` by 0.5 points. We choose the best settings for the two methods for a fair comparison. | ||
|
||
| Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP@50 | Config | Download | | ||
| :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: | | ||
| R-50-DC5 | pytorch | 7e | 4.14 | - | 79.8 | [config](selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py) | [model](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714-939fd657.pth) | [log](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714.log.json) | | ||
| R-101-DC5 | pytorch | 7e | 5.83 | - | 82.6 | [config](selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py) | [model](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth) | [log](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621.log.json) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
Collections: | ||
- Name: Temporal RoI Align | ||
Metadata: | ||
Training Data: ILSVRC | ||
Training Techniques: | ||
- SGD with Momentum | ||
Training Resources: 8x V100 GPUs | ||
Architecture: | ||
- ResNet | ||
Paper: https://ojs.aaai.org/index.php/AAAI/article/view/16234 | ||
README: configs/vid/temporal_roi_align/README.md | ||
|
||
Models: | ||
- Name: selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid | ||
In Collection: SELSA-TemporalRoIAlign | ||
Config: configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py | ||
Metadata: | ||
Training Memory (GB): 4.14 | ||
Results: | ||
- Task: Video Object Detection | ||
Dataset: ILSVRC | ||
Metrics: | ||
box AP@0.5: 79.8 | ||
Weights: https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714-939fd657.pth | ||
|
||
- Name: selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid | ||
In Collection: SELSA-TemporalRoIAlign | ||
Config: configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py | ||
Metadata: | ||
Training Memory (GB): 5.83 | ||
Results: | ||
- Task: Video Object Detection | ||
Dataset: ILSVRC | ||
Metrics: | ||
box AP@0.5: 82.6 | ||
Weights: https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth |
7 changes: 7 additions & 0 deletions
7
configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
_base_ = ['./selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py'] | ||
model = dict( | ||
detector=dict( | ||
backbone=dict( | ||
depth=101, | ||
init_cfg=dict( | ||
type='Pretrained', checkpoint='torchvision://resnet101')))) |
55 changes: 55 additions & 0 deletions
55
configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
_base_ = [ | ||
'../../_base_/models/faster_rcnn_r50_dc5.py', | ||
'../../_base_/datasets/imagenet_vid_fgfa_style.py', | ||
'../../_base_/default_runtime.py' | ||
] | ||
model = dict( | ||
type='SELSA', | ||
detector=dict( | ||
roi_head=dict( | ||
type='SelsaRoIHead', | ||
bbox_roi_extractor=dict( | ||
type='TemporalRoIAlign', | ||
num_most_similar_points=2, | ||
num_temporal_attention_blocks=4, | ||
roi_layer=dict( | ||
type='RoIAlign', output_size=7, sampling_ratio=2), | ||
out_channels=512, | ||
featmap_strides=[16]), | ||
bbox_head=dict( | ||
type='SelsaBBoxHead', | ||
num_shared_fcs=3, | ||
aggregator=dict( | ||
type='SelsaAggregator', | ||
in_channels=1024, | ||
num_attention_blocks=16))))) | ||
|
||
# dataset settings | ||
data = dict( | ||
val=dict( | ||
ref_img_sampler=dict( | ||
_delete_=True, | ||
num_ref_imgs=14, | ||
frame_range=[-7, 7], | ||
method='test_with_adaptive_stride')), | ||
test=dict( | ||
ref_img_sampler=dict( | ||
_delete_=True, | ||
num_ref_imgs=14, | ||
frame_range=[-7, 7], | ||
method='test_with_adaptive_stride'))) | ||
|
||
# optimizer | ||
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) | ||
optimizer_config = dict( | ||
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) | ||
# learning policy | ||
lr_config = dict( | ||
policy='step', | ||
warmup='linear', | ||
warmup_iters=500, | ||
warmup_ratio=1.0 / 3, | ||
step=[2, 5]) | ||
# runtime settings | ||
total_epochs = 7 | ||
evaluation = dict(metric=['bbox'], interval=7) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,8 @@ | ||
# Copyright (c) OpenMMLab. All rights reserved. | ||
from .bbox_heads import SelsaBBoxHead | ||
from .roi_extractors import SingleRoIExtractor, TemporalRoIAlign | ||
from .selsa_roi_head import SelsaRoIHead | ||
|
||
__all__ = ['SelsaRoIHead', 'SelsaBBoxHead'] | ||
__all__ = [ | ||
'SelsaRoIHead', 'SelsaBBoxHead', 'TemporalRoIAlign', 'SingleRoIExtractor' | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
from .single_level_roi_extractor import SingleRoIExtractor | ||
from .temporal_roi_align import TemporalRoIAlign | ||
|
||
__all__ = ['SingleRoIExtractor', 'TemporalRoIAlign'] |
19 changes: 19 additions & 0 deletions
19
mmtrack/models/roi_heads/roi_extractors/single_level_roi_extractor.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from mmcv.runner import force_fp32 | ||
from mmdet.models.builder import ROI_EXTRACTORS | ||
from mmdet.models.roi_heads.roi_extractors import \ | ||
SingleRoIExtractor as _SingleRoIExtractor | ||
|
||
|
||
@ROI_EXTRACTORS.register_module(force=True) | ||
class SingleRoIExtractor(_SingleRoIExtractor): | ||
"""Extract RoI features from a single level feature map. | ||
This Class is the same as `SingleRoIExtractor` from | ||
`mmdet.models.roi_heads.roi_extractors` except for using `**kwargs` to | ||
accept external arguments. | ||
""" | ||
|
||
@force_fp32(apply_to=('feats', ), out_fp16=True) | ||
def forward(self, feats, rois, roi_scale_factor=None, **kwargs): | ||
"""Forward function.""" | ||
return super().forward(feats, rois, roi_scale_factor) |
Oops, something went wrong.