Skip to content

Commit

Permalink
Release code of AAAI 2021 paper. "Temporal ROI Align for Video Object…
Browse files Browse the repository at this point in the history
… Recognition" (#247)

* release the code of Temporal RoI Align

* add docstring for troi code

* add unittest for troi code

* change README.md, metafile.yml, model_zoo.md and model-index.yml

* tiny changes of README.md

* tiny changes of metafile.yml

* tiny changes of README.md

* update based 1-st comments

* update based on 2-nd comments
  • Loading branch information
GT9505 authored Aug 25, 2021
1 parent a1fb68b commit 8016b90
Show file tree
Hide file tree
Showing 16 changed files with 427 additions and 3 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ Supported methods of video object detection:
- [x] [DFF](configs/vid/dff) (CVPR 2017)
- [x] [FGFA](configs/vid/fgfa) (ICCV 2017)
- [x] [SELSA](configs/vid/selsa) (ICCV 2019)
- [x] [Temporal RoI Align](configs/vid/temporal_roi_align) (AAAI 2021)

Supported methods of multi object tracking:

Expand Down
28 changes: 28 additions & 0 deletions configs/vid/temporal_roi_align/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Temporal RoI Align for Video Object Recognition

## Introduction

[ALGORITHM]

```latex
@inproceedings{gong2021temporal,
title={Temporal ROI Align for Video Object Recognition},
author={Gong, Tao and Chen, Kai and Wang, Xinjiang and Chu, Qi and Zhu, Feng and Lin, Dahua and Yu, Nenghai and Feng, Huamin},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={2},
pages={1442--1450},
year={2021}
}
```

## Results and models on ImageNet VID dataset

We observed that the performance of this method has a fluctuation of about 0.5 mAP. The checkpoint provided below is the best one from two experiments.

Note that the numbers of selsa modules in this method and `SELSA` are 3 and 2 respectively. This is because another selsa modules improve this method by 0.2 points but degrade `SELSA` by 0.5 points. We choose the best settings for the two methods for a fair comparison.

| Backbone | Style | Lr schd | Mem (GB) | Inf time (fps) | box AP@50 | Config | Download |
| :-------------: | :-----: | :-----: | :------: | :------------: | :----: | :------: | :--------: |
| R-50-DC5 | pytorch | 7e | 4.14 | - | 79.8 | [config](selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py) | [model](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714-939fd657.pth) | [log](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714.log.json) |
| R-101-DC5 | pytorch | 7e | 5.83 | - | 82.6 | [config](selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py) | [model](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth) | [log](https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621.log.json) |
36 changes: 36 additions & 0 deletions configs/vid/temporal_roi_align/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Collections:
- Name: Temporal RoI Align
Metadata:
Training Data: ILSVRC
Training Techniques:
- SGD with Momentum
Training Resources: 8x V100 GPUs
Architecture:
- ResNet
Paper: https://ojs.aaai.org/index.php/AAAI/article/view/16234
README: configs/vid/temporal_roi_align/README.md

Models:
- Name: selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid
In Collection: SELSA-TemporalRoIAlign
Config: configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py
Metadata:
Training Memory (GB): 4.14
Results:
- Task: Video Object Detection
Dataset: ILSVRC
Metrics:
box AP@0.5: 79.8
Weights: https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid_20210820_162714-939fd657.pth

- Name: selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid
In Collection: SELSA-TemporalRoIAlign
Config: configs/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid.py
Metadata:
Training Memory (GB): 5.83
Results:
- Task: Video Object Detection
Dataset: ILSVRC
Metrics:
box AP@0.5: 82.6
Weights: https://download.openmmlab.com/mmtracking/vid/temporal_roi_align/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid/selsa_troialign_faster_rcnn_r101_dc5_7e_imagenetvid_20210822_111621-22cb96b9.pth
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
_base_ = ['./selsa_troialign_faster_rcnn_r50_dc5_7e_imagenetvid.py']
model = dict(
detector=dict(
backbone=dict(
depth=101,
init_cfg=dict(
type='Pretrained', checkpoint='torchvision://resnet101'))))
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
_base_ = [
'../../_base_/models/faster_rcnn_r50_dc5.py',
'../../_base_/datasets/imagenet_vid_fgfa_style.py',
'../../_base_/default_runtime.py'
]
model = dict(
type='SELSA',
detector=dict(
roi_head=dict(
type='SelsaRoIHead',
bbox_roi_extractor=dict(
type='TemporalRoIAlign',
num_most_similar_points=2,
num_temporal_attention_blocks=4,
roi_layer=dict(
type='RoIAlign', output_size=7, sampling_ratio=2),
out_channels=512,
featmap_strides=[16]),
bbox_head=dict(
type='SelsaBBoxHead',
num_shared_fcs=3,
aggregator=dict(
type='SelsaAggregator',
in_channels=1024,
num_attention_blocks=16)))))

# dataset settings
data = dict(
val=dict(
ref_img_sampler=dict(
_delete_=True,
num_ref_imgs=14,
frame_range=[-7, 7],
method='test_with_adaptive_stride')),
test=dict(
ref_img_sampler=dict(
_delete_=True,
num_ref_imgs=14,
frame_range=[-7, 7],
method='test_with_adaptive_stride')))

# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[2, 5])
# runtime settings
total_epochs = 7
evaluation = dict(metric=['bbox'], interval=7)
4 changes: 4 additions & 0 deletions docs/model_zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ Please refer to [FGFA](https://github.com/open-mmlab/mmtracking/blob/master/conf

Please refer to [SELSA](https://github.com/open-mmlab/mmtracking/blob/master/configs/vid/selsa) for details.

### Temporal RoI Align (AAAI 2021)

Please refer to [Temporal RoI Align](https://github.com/open-mmlab/mmtracking/blob/master/configs/vid/temporal_roi_align) for details.

## Baselines of multiple object tracking

### SORT/DeepSORT (ICIP 2016/2017)
Expand Down
4 changes: 4 additions & 0 deletions docs_zh-CN/model_zoo.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@

详情请参考 [SELSA](../configs/vid/selsa/README.md)

### Temporal RoI Align (AAAI 2021)

详情请参考 [Temporal RoI Align](https://github.com/open-mmlab/mmtracking/blob/master/configs/vid/temporal_roi_align)

## 多目标跟踪基线

### SORT/DeepSORT (ICIP 2016/2017)
Expand Down
5 changes: 4 additions & 1 deletion mmtrack/models/roi_heads/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Copyright (c) OpenMMLab. All rights reserved.
from .bbox_heads import SelsaBBoxHead
from .roi_extractors import SingleRoIExtractor, TemporalRoIAlign
from .selsa_roi_head import SelsaRoIHead

__all__ = ['SelsaRoIHead', 'SelsaBBoxHead']
__all__ = [
'SelsaRoIHead', 'SelsaBBoxHead', 'TemporalRoIAlign', 'SingleRoIExtractor'
]
4 changes: 4 additions & 0 deletions mmtrack/models/roi_heads/roi_extractors/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from .single_level_roi_extractor import SingleRoIExtractor
from .temporal_roi_align import TemporalRoIAlign

__all__ = ['SingleRoIExtractor', 'TemporalRoIAlign']
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from mmcv.runner import force_fp32
from mmdet.models.builder import ROI_EXTRACTORS
from mmdet.models.roi_heads.roi_extractors import \
SingleRoIExtractor as _SingleRoIExtractor


@ROI_EXTRACTORS.register_module(force=True)
class SingleRoIExtractor(_SingleRoIExtractor):
"""Extract RoI features from a single level feature map.
This Class is the same as `SingleRoIExtractor` from
`mmdet.models.roi_heads.roi_extractors` except for using `**kwargs` to
accept external arguments.
"""

@force_fp32(apply_to=('feats', ), out_fp16=True)
def forward(self, feats, rois, roi_scale_factor=None, **kwargs):
"""Forward function."""
return super().forward(feats, rois, roi_scale_factor)
Loading

0 comments on commit 8016b90

Please sign in to comment.