Skip to content

Commit bf6694b

Browse files
Release OmniSource ckpts (open-mmlab#215)
* init commit * update README * update ckpt links * update changelog * update * fix path * minor fix * reorganize
1 parent d8125cb commit bf6694b

File tree

6 files changed

+173
-1
lines changed

6 files changed

+173
-1
lines changed
+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Omni-sourced Webly-supervised Learning for Video Recognition
2+
3+
[Haodong Duan](https://github.com/kennymckormick), [Yue Zhao](https://github.com/zhaoyue-zephyrus), [Yuanjun Xiong](https://github.com/yjxiong), Wentao Liu, [Dahua Lin](https://github.com/lindahua)
4+
5+
In ECCV, 2020. [Paper](arxiv.org/abs/2003.13042)
6+
7+
![pipeline](pipeline.png)
8+
9+
### Release
10+
11+
We currently released 4 models trained with OmniSource framework, including both 2D and 3D architectures. We compare the performance of models trained with or without OmniSource in the following table.
12+
13+
| Model | Modality | Pretrained | Backbone | Input | Resolution | Top-1 (Baseline / OmniSource (Delta)) | Top-5 (Baseline / OmniSource (Delta))) | Download |
14+
| :------: | :------: | :--------: | :-------: | :---: | :------------: | :-----------------------------------: | :------------------------------------: | :----------------------------------------------------------: |
15+
| TSN | RGB | ImageNet | ResNet50 | 3seg | 340x256 | 70.6 / 73.6 (+ 3.0) | 89.4 / 91.0 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) |
16+
| TSN | RGB | IG-1B | ResNet50 | 3seg | short-side 320 | 73.1 / 75.7 (+ 2.6) | 90.4 / 91.9 (+ 1.5) | [Baseline](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) |
17+
| SlowOnly | RGB | Scratch | ResNet50 | 4x16 | short-side 320 | 72.9 / 76.8 (+ 3.9) | 90.9 / 92.5 (+ 1.6) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) |
18+
| SlowOnly | RGB | Scratch | ResNet101 | 8x8 | short-side 320 | 76.5 / 80.4 (+ 3.9) | 92.7 / 94.4 (+ 1.7) | [Baseline](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) / [OmniSource](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) |
19+
20+
We will soon release the web dataset and training code used by OmniSource.
21+
22+
### Citing OmniSource
23+
24+
If you find OmniSource useful for your research, please consider citing the paper using the following BibTeX entry.
25+
26+
```
27+
@article{duan2020omni,
28+
title={Omni-sourced Webly-supervised Learning for Video Recognition},
29+
author={Duan, Haodong and Zhao, Yue and Xiong, Yuanjun and Liu, Wentao and Lin, Dahua},
30+
journal={arXiv preprint arXiv:2003.13042},
31+
year={2020}
32+
}
33+
```
239 KB
Loading

configs/recognition/slowonly/README.md

+9
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,15 @@ In data benchmark, we compare two different data preprocessing methods: (1) Resi
3636
| [slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb](data_benchmark/slowonly_r50_randomresizedcrop_320p_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | 8x2 | ResNet50 | 4x16 | None | 73.02 | 90.77 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) |
3737
| [slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb](data_benchmark/slowonly_r50_randomresizedcrop_256p_4x16x1_256e_kinetics400_rgb.py) | short-side 256 | 8x4 | ResNet50 | 4x16 | None | 72.76 | 90.51 | 10 clips x 3 crops | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb_20200820-bea7701f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_256p_4x16x1_256e_kinetics400_rgb/20200817_001411.log.json) |
3838

39+
### Kinetics-400 OmniSource Experiments
40+
41+
| config | resolution | backbone | pretrain | w. OmniSource | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json |
42+
| :----------------------------------------------------------: | :------------: | :-------: | :------: | :----------------: | :------: | :------: | :---------------------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
43+
| [slowonly_r50_4x16x1_256e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb.py) | short-side 320 | ResNet50 | None | :x: | 73.0 | 90.8 | 4.3 (25x10 frames) | 3168 | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16x1_256e_kinetics400_rgb_20200704-a69556c6.pth) | [log](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/so_4x16.log) | [json](https://download.openmmlab.com/mmaction/recognition/slowonly/slowonly_r50_4x16x1_256e_kinetics400_rgb/slowonly_r50_4x16_73.02_90.77.log.json) |
44+
| x | x | ResNet50 | None | :heavy_check_mark: | 76.8 | 92.5 | x | x | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r50_omni_4x16x1_kinetics400_rgb_20200926-51b1f7ea.pth) | x | x |
45+
| [slowonly_r101_8x8x1_196e_kinetics400_rgb](/configs/recognition/slowonly/slowonly_r101_8x8x1_196e_kinetics400_rgb.py) | x | ResNet101 | None | :x: | 76.5 | 92.7 | x | x | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_without_omni_8x8x1_kinetics400_rgb_20200926-0c730aef.pth) | x | x |
46+
| x | x | ResNet101 | None | :heavy_check_mark: | 80.4 | 94.4 | x | x | [ckpt](https://download.openmmlab.com/mmaction/recognition/slowonly/omni/slowonly_r101_omni_8x8x1_kinetics400_rgb_20200926-b5dbb701.pth) | x | x |
47+
3948
Notes:
4049

4150
1. The **gpus** indicates the number of gpu we used to get the checkpoint. It is noteworthy that the configs we provide are used for 8 gpus as default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
model = dict(
2+
type='Recognizer3D',
3+
backbone=dict(
4+
type='ResNet3dSlowOnly',
5+
depth=101,
6+
pretrained=None,
7+
lateral=False,
8+
conv1_kernel=(1, 7, 7),
9+
conv1_stride_t=1,
10+
pool1_stride_t=1,
11+
inflate=(0, 0, 1, 1),
12+
norm_eval=False),
13+
cls_head=dict(
14+
type='I3DHead',
15+
in_channels=2048,
16+
num_classes=400,
17+
spatial_type='avg',
18+
dropout_ratio=0.5))
19+
train_cfg = None
20+
test_cfg = dict(average_clips=None)
21+
dataset_type = 'RawframeDataset'
22+
data_root = 'data/kinetics400/rawframes_train'
23+
data_root_val = 'data/kinetics400/rawframes_val'
24+
ann_file_train = 'data/kinetics400/kinetics400_train_list_rawframes.txt'
25+
ann_file_val = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
26+
ann_file_test = 'data/kinetics400/kinetics400_val_list_rawframes.txt'
27+
img_norm_cfg = dict(
28+
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
29+
train_pipeline = [
30+
dict(type='SampleFrames', clip_len=8, frame_interval=8, num_clips=1),
31+
dict(type='RawFrameDecode'),
32+
dict(type='Resize', scale=(-1, 256)),
33+
dict(type='RandomResizedCrop'),
34+
dict(type='Resize', scale=(224, 224), keep_ratio=False),
35+
dict(type='Flip', flip_ratio=0.5),
36+
dict(type='Normalize', **img_norm_cfg),
37+
dict(type='FormatShape', input_format='NCTHW'),
38+
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
39+
dict(type='ToTensor', keys=['imgs', 'label'])
40+
]
41+
val_pipeline = [
42+
dict(
43+
type='SampleFrames',
44+
clip_len=8,
45+
frame_interval=8,
46+
num_clips=1,
47+
test_mode=True),
48+
dict(type='RawFrameDecode'),
49+
dict(type='Resize', scale=(-1, 256)),
50+
dict(type='CenterCrop', crop_size=224),
51+
dict(type='Flip', flip_ratio=0),
52+
dict(type='Normalize', **img_norm_cfg),
53+
dict(type='FormatShape', input_format='NCTHW'),
54+
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
55+
dict(type='ToTensor', keys=['imgs'])
56+
]
57+
test_pipeline = [
58+
dict(
59+
type='SampleFrames',
60+
clip_len=8,
61+
frame_interval=8,
62+
num_clips=10,
63+
test_mode=True),
64+
dict(type='RawFrameDecode'),
65+
dict(type='Resize', scale=(-1, 256)),
66+
dict(type='ThreeCrop', crop_size=256),
67+
dict(type='Flip', flip_ratio=0),
68+
dict(type='Normalize', **img_norm_cfg),
69+
dict(type='FormatShape', input_format='NCTHW'),
70+
dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
71+
dict(type='ToTensor', keys=['imgs'])
72+
]
73+
data = dict(
74+
videos_per_gpu=8,
75+
workers_per_gpu=4,
76+
train=dict(
77+
type=dataset_type,
78+
ann_file=ann_file_train,
79+
data_prefix=data_root,
80+
pipeline=train_pipeline),
81+
val=dict(
82+
type=dataset_type,
83+
ann_file=ann_file_val,
84+
data_prefix=data_root_val,
85+
pipeline=val_pipeline),
86+
test=dict(
87+
type=dataset_type,
88+
ann_file=ann_file_test,
89+
data_prefix=data_root_val,
90+
pipeline=test_pipeline))
91+
# optimizer
92+
optimizer = dict(
93+
type='SGD', lr=0.1, momentum=0.9,
94+
weight_decay=0.0001) # this lr is used for 8 gpus
95+
optimizer_config = dict(grad_clip=dict(max_norm=40, norm_type=2))
96+
# learning policy
97+
lr_config = dict(
98+
policy='CosineAnnealing',
99+
min_lr=0,
100+
warmup='linear',
101+
warmup_ratio=0.1,
102+
warmup_by_epoch=True,
103+
warmup_iters=34)
104+
total_epochs = 196
105+
checkpoint_config = dict(interval=4)
106+
workflow = [('train', 1)]
107+
evaluation = dict(
108+
interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
109+
log_config = dict(
110+
interval=20, hooks=[
111+
dict(type='TextLoggerHook'),
112+
])
113+
dist_params = dict(backend='nccl')
114+
log_level = 'INFO'
115+
work_dir = './work_dirs/slowonly_r101_8x8x1_196e_kinetics400_rgb'
116+
load_from = None
117+
resume_from = None
118+
find_unused_parameters = False

configs/recognition/tsn/README.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
|tsn_r50_320p_1x1x8_kinetics400_twostream [1: 1]* |x|x| ResNet50| ImageNet |74.64|91.77| x | x | x | x | x|x|x|
3737
|[tsn_r50_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb.py) |340x256|8| ResNet50 | ImageNet|70.77|89.3|[68.75](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|[88.42](https://github.com/mit-han-lab/temporal-shift-module/tree/8d53d6fda40bea2f1b37a6095279c4b454d672bd#training)|12.2 (8x10 frames)|8344| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_dense_1x1x8_100e_kinetics400_rgb_20200606-e925e6e3.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_dense_1x1x8_100e_kinetics400_rgb/20200606_003901.log.json)|
3838
|[tsn_r50_video_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb.py) |short-side 256|8| ResNet50| ImageNet | 71.79 | 90.25 |x|x|x|21558| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_1x1x8_100e_kinetics400_rgb_20200702-568cde33.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_100e_kinetics400_rgb.log.json)|
39-
|[tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py) |short-side 256|8| ResNet50| ImageNet | 70.4 | 89.12 |x|x|x|21553| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log.json)|
39+
|[tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb.py) |short-side 256|8| ResNet50| ImageNet | 70.40 | 89.12 |x|x|x|21553| [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb_20200703-0f19175f.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log)| [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_video_dense_1x1x8_100e_kinetics400_rgb/tsn_r50_video_2d_1x1x8_dense_100e_kinetics400_rgb.log.json)|
4040

4141
Here, We use [1: 1] to indicate that we combine rgb and flow score with coefficients 1: 1 to get the two-stream prediction (without applying softmax).
4242

@@ -62,6 +62,16 @@ In data benchmark, we compare:
6262
| [tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/data_benchmark/tsn_r50_randomresizedcrop_256p_1x1x3_100e_kinetics400_rgb.py) | short-side 256 | RandomResizedCrop | 25x10 frames | 69.80 | 89.06 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb_20200817-ae7963ca.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_256p_randomresize_1x1x3_100e_kinetics400_rgb/20200815_172601.log.json)|
6363
| x | short-side 256 | RandomResizedCrop | 25x3 frames | 70.48 | 89.89 | x | x | x |
6464

65+
### Kinetics-400 OmniSource Experiments
66+
67+
| config | resolution | backbone | pretrain | w. OmniSource | top1 acc | top5 acc | inference_time(video/s) | gpu_mem(M) | ckpt | log | json |
68+
| :----------------------------------------------------------: | :------------: | :------: | :-------: | :----------------: | :------: | :------: | :---------------------: | :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
69+
| [tsn_r50_1x1x3_100e_kinetics400_rgb](/configs/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb.py) | 340x256 | ResNet50 | ImageNet | :x: | 70.6 | 89.3 | 4.3 (25x10 frames) | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth) | [log](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log) | [json](https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/20200614_063526.log.json) |
70+
| x | 340x256 | ResNet50 | ImageNet | :heavy_check_mark: | 73.6 | 91.0 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_imagenet_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-54192355.pth) | x | x |
71+
| x | short-side 320 | ResNet50 | IG-1B [1] | :x: | 73.1 | 90.4 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_without_omni_1x1x3_kinetics400_rgb_20200926-c133dd49.pth) | x | x |
72+
| x | short-side 320 | ResNet50 | IG-1B [1] | :heavy_check_mark: | 75.7 | 91.9 | x | 8344 | [ckpt](https://download.openmmlab.com/mmaction/recognition/tsn/omni/tsn_1G1B_pretrained_r50_omni_1x1x3_kinetics400_rgb_20200926-2863fed0.pth) | x | x |
73+
74+
[1] We obtain the pre-trained model from [torch-hub](https://pytorch.org/hub/facebookresearch_semi-supervised-ImageNet1K-models_resnext/), the pretrain model we used is `resnet50_swsl`
6575

6676
### Something-Something V1
6777

docs/changelog.md

+2
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,11 @@
88
- Support to run real-time action recognition from web camera ([#171](https://github.com/open-mmlab/mmaction2/pull/171))
99
- Support to export the pytorch models to onnx ones. ([#160](https://github.com/open-mmlab/mmaction2/pull/160))
1010
- Support to report mAP for ActivityNet with [CUHK17_activitynet_pred](http://activity-net.org/challenges/2017/evaluation.html). ([#176](https://github.com/open-mmlab/mmaction2/pull/176))
11+
- Add the data pipeline for ActivityNet, which includes downloading videos, extracting RGB and Flow frames, finetuning TSN and extracting feature. ([#190](https://github.com/open-mmlab/mmaction2/pull/190))
1112

1213
**ModelZoo**
1314
- Add finetuning setting for SlowOnly. ([#173](https://github.com/open-mmlab/mmaction2/pull/173))
15+
- Add TSN and SlowOnly models trained with [OmniSource](https://arxiv.org/abs/2003.13042), which achieve 75.7% Top-1 with TSN-R50-3seg and 80.4% Top-1 with SlowOnly-R101-8x8. ([#215](https://github.com/open-mmlab/mmaction2/pull/215))
1416

1517
**Improvements**
1618
- Support to run a demo with a video url ([#165](https://github.com/open-mmlab/mmaction2/pull/165))

0 commit comments

Comments
 (0)