This repository contains the code for the following paper:
Jiangliu Wang, Jianbo Jiao and Yunhui Liu, "Self-Supervised Video Representation Learning by Pace Prediction", In: ECCV (2020).
- pytroch >= 1.3.0
- tensorboardX
- cv2
- scipy
UCF101 dataset
- Download the original UCF101 dataset from the official website. And then extarct RGB images from videos.
- Or direclty download the pre-processed RGB data of UCF101 here provided by feichtenhofer.
Train with pace prediction task on S3D-G, the default clip length is 64 and input video size is 224 x 224.
python train.py --rgb_prefix RGB_DIR --gpu 0,1,2,3 --bs 32 --lr 0.001 --height 256 --width 256 --crop_sz 224 --clip_len 64
Train with pace prediction task on c3d/r3d/r21d, the default clip length is 16 and input video size is 112 x 112.
python train.py --rgb_prefix RGB_DIR --gpu 0 --bs 30 --lr 0.001 --model c3d/r3d/r21d --height 128 --width 171 --crop_sz 112 --clip_len 16
To be updated...
If you find this work useful or use our code, please consider citing:
@InProceedings{Wang20,
author = "Jiangliu Wang and Jianbo Jiao and Yunhui Liu",
title = "Self-Supervised Video Representation Learning by Pace Prediction",
booktitle = "European Conference on Computer Vision",
year = "2020",
}
Part of our codes are adapted from S3D-G HowTO100M, we thank the authors for their contributions.