forked from open-mmlab/mmagic
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Add config file of FLAVR (open-mmlab#867)
* [Feature] Add config file of FLAVR * Update
- Loading branch information
1 parent
b77cc3e
commit bbd7d95
Showing
4 changed files
with
236 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# FLAVR (arXiv'2020) | ||
|
||
> [FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation](https://arxiv.org/pdf/2012.08512.pdf) | ||
<!-- [ALGORITHM] --> | ||
|
||
## Abstract | ||
|
||
<!-- [ABSTRACT] --> | ||
|
||
Most modern frame interpolation approaches rely on explicit bidirectional optical flows between adjacent frames, thus are sensitive to the accuracy of underlying flow estimation in handling occlusions while additionally introducing computational bottlenecks unsuitable for efficient deployment. In this work, we propose a flow-free approach that is completely end-to-end trainable for multi-frame video interpolation. Our method, FLAVR, is designed to reason about non-linear motion trajectories and complex occlusions implicitly from unlabeled videos and greatly simplifies the process of training, testing and deploying frame interpolation models. Furthermore, FLAVR delivers up to 6× speed up compared to the current state-of-the-art methods for multi-frame interpolation while consistently demonstrating superior qualitative and quantitative results compared with prior methods on popular benchmarks including Vimeo-90K, Adobe-240FPS, and GoPro. Finally, we show that frame interpolation is a competitive self-supervised pre-training task for videos via demonstrating various novel applications of FLAVR including action recognition, optical flow estimation, motion magnification, and video object tracking. Code and trained models are provided in the supplementary material. | ||
|
||
<!-- [IMAGE] --> | ||
|
||
<div align=center > | ||
<img src="https://user-images.githubusercontent.com/56712176/169070212-52acdcea-d732-4441-9983-276e2e40b195.png" width="400"/> | ||
</div > | ||
|
||
## Results and models | ||
|
||
Evaluated on RGB channels. | ||
The metrics are `PSNR / SSIM` . | ||
|
||
| Method | scale | Vimeo90k-triplet | Download | | ||
| :------------------------------------------------------------------------------------------------------------------: | :---: | :---------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | ||
| [flavr_in4out1_g8b4_vimeo90k_septuplet](/configs/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septuplet.py) | x2 | 36.3340 / 0.96015 | [model](https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.pth) \| [log](https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.log.json) | | ||
|
||
Note: FLAVR for x8 VFI task will supported in the future. | ||
|
||
## Citation | ||
|
||
```bibtex | ||
@article{kalluri2020flavr, | ||
title={Flavr: Flow-agnostic video representations for fast frame interpolation}, | ||
author={Kalluri, Tarun and Pathak, Deepak and Chandraker, Manmohan and Tran, Du}, | ||
journal={arXiv preprint arXiv:2012.08512}, | ||
year={2020} | ||
} | ||
``` |
174 changes: 174 additions & 0 deletions
174
configs/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septuplet.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
exp_name = 'flavr_in4out1_g8b4_vimeo90k_septuplet' | ||
|
||
# model settings | ||
model = dict( | ||
type='BasicInterpolator', | ||
generator=dict( | ||
type='FLAVRNet', | ||
num_input_frames=4, | ||
num_output_frames=1, | ||
mid_channels_list=[512, 256, 128, 64], | ||
encoder_layers_list=[2, 2, 2, 2], | ||
bias=False, | ||
norm_cfg=None, | ||
join_type='concat', | ||
up_mode='transpose'), | ||
pixel_loss=dict(type='L1Loss', loss_weight=1.0, reduction='mean')) | ||
# model training and testing settings | ||
train_cfg = None | ||
test_cfg = dict(metrics=['PSNR', 'SSIM', 'MAE'], crop_border=0) | ||
|
||
# dataset settings | ||
train_dataset_type = 'VFIVimeo90K7FramesDataset' | ||
val_dataset_type = 'VFIVimeo90K7FramesDataset' | ||
|
||
train_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='target', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='FixedCrop', keys=['inputs', 'target'], crop_size=(256, 256)), | ||
dict( | ||
type='Flip', | ||
keys=['inputs', 'target'], | ||
flip_ratio=0.5, | ||
direction='horizontal'), | ||
dict( | ||
type='Flip', | ||
keys=['inputs', 'target'], | ||
flip_ratio=0.5, | ||
direction='vertical'), | ||
dict( | ||
type='ColorJitter', | ||
keys=['inputs', 'target'], | ||
channel_order='rgb', | ||
brightness=0.05, | ||
contrast=0.05, | ||
saturation=0.05, | ||
hue=0.05), | ||
dict(type='TemporalReverse', keys=['inputs'], reverse_ratio=0.5), | ||
dict(type='RescaleToZeroOne', keys=['inputs', 'target']), | ||
dict(type='FramesToTensor', keys=['inputs', 'target']), | ||
dict( | ||
type='Collect', | ||
keys=['inputs', 'target'], | ||
meta_keys=['inputs_path', 'target_path', 'key']) | ||
] | ||
|
||
valid_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='target', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs', 'target']), | ||
dict(type='FramesToTensor', keys=['inputs', 'target']), | ||
dict( | ||
type='Collect', | ||
keys=['inputs', 'target'], | ||
meta_keys=['inputs_path', 'target_path', 'key']) | ||
] | ||
|
||
demo_pipeline = [ | ||
dict( | ||
type='LoadImageFromFileList', | ||
io_backend='disk', | ||
key='inputs', | ||
channel_order='rgb', | ||
backend='pillow'), | ||
dict(type='RescaleToZeroOne', keys=['inputs']), | ||
dict(type='FramesToTensor', keys=['inputs']), | ||
dict(type='Collect', keys=['inputs'], meta_keys=['inputs_path', 'key']) | ||
] | ||
|
||
root_dir = 'data/vimeo90k' | ||
data = dict( | ||
workers_per_gpu=16, | ||
train_dataloader=dict(samples_per_gpu=4), # 8 gpu | ||
val_dataloader=dict(samples_per_gpu=1), | ||
test_dataloader=dict(samples_per_gpu=1), | ||
|
||
# train | ||
train=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/GT', | ||
ann_file=f'{root_dir}/sep_trainlist.txt', | ||
pipeline=train_pipeline, | ||
input_frames=[1, 3, 5, 7], | ||
target_frames=[4], | ||
test_mode=False), | ||
# val | ||
val=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/GT', | ||
ann_file=f'{root_dir}/sep_testlist.txt', | ||
pipeline=valid_pipeline, | ||
input_frames=[1, 3, 5, 7], | ||
target_frames=[4], | ||
test_mode=True), | ||
# test | ||
test=dict( | ||
type=train_dataset_type, | ||
folder=f'{root_dir}/GT', | ||
ann_file=f'{root_dir}/sep_testlist.txt', | ||
pipeline=valid_pipeline, | ||
input_frames=[1, 3, 5, 7], | ||
target_frames=[4], | ||
test_mode=True), | ||
) | ||
|
||
# optimizer | ||
optimizers = dict(generator=dict(type='Adam', lr=2e-4, betas=(0.9, 0.99))) | ||
|
||
# learning policy | ||
total_iters = 1000000 # >=200*64612/64 | ||
lr_config = dict( | ||
policy='Reduce', | ||
by_epoch=False, | ||
mode='max', | ||
val_metric='PSNR', | ||
epoch_base_valid=True, # Support epoch base valid in iter base runner. | ||
factor=0.5, | ||
patience=10, | ||
cooldown=20, | ||
verbose=True) | ||
|
||
checkpoint_config = dict(interval=2020, save_optimizer=True, by_epoch=False) | ||
|
||
evaluation = dict(interval=2020, save_image=False, gpu_collect=True) | ||
log_config = dict( | ||
interval=100, | ||
hooks=[ | ||
dict(type='TextLoggerHook', by_epoch=False), | ||
dict( | ||
type='TensorboardLoggerHook', | ||
log_dir=f'work_dirs/{exp_name}/tb_log/', | ||
interval=100, | ||
ignore_last=False, | ||
reset_flag=False, | ||
by_epoch=False), | ||
]) | ||
visual_config = None | ||
|
||
# runtime settings | ||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
work_dir = f'./work_dirs/{exp_name}' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
Collections: | ||
- Metadata: | ||
Architecture: | ||
- FLAVR | ||
Name: FLAVR | ||
Paper: | ||
- https://arxiv.org/pdf/2012.08512.pdf | ||
README: configs/video_interpolators/flavr/README.md | ||
Models: | ||
- Config: configs/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septuplet.py | ||
In Collection: FLAVR | ||
Metadata: | ||
Training Data: VIMEO90K | ||
Name: flavr_in4out1_g8b4_vimeo90k_septuplet | ||
Results: | ||
- Dataset: VIMEO90K | ||
Metrics: | ||
Vimeo90k-triplet: | ||
PSNR: 36.334 | ||
SSIM: 0.96015 | ||
Task: Video_interpolators | ||
Weights: https://download.openmmlab.com/mmediting/video_interpolators/flavr/flavr_in4out1_g8b4_vimeo90k_septupli-c2468995.pth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters