Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian Tang, Rong Jin, Changxin Gao, Nong Sang
In CVPR, 2022. [Paper]. [Website]

Latest

[2022-08] Codes are available!

This repo is a modification on the TAdaConv repo.

Installation

Requirements:

Python>=3.6
torch>=1.5
torchvision (version corresponding with torch)
simplejson==3.11.1
decord>=0.6.0
pyyaml
einops
oss2
psutil
tqdm
pandas

optional requirements

fvcore (for flops calculation)

Data Preparation

The volume of a single long untrimmed video is often bulky, and it is inefficient to directly decode the long video. To improve pre-training efficiency, we recommend cutting long videos into multiple small clips (5s/clip) sequentially. Take the first training video (i.e., v_--0edUL8zmA.mp4) in the HACS dataset as an example, the duration of this video is 92s, so the pre-training videos are expected to be in the train/ folder:

/path/to/hacs-clips/
  train/
    # 1st video
    v_--0edUL8zmA_0000000_0005000.mp4
    v_--0edUL8zmA_0005000_0010000.mp4
    v_--0edUL8zmA_0010000_0015000.mp4
    v_--0edUL8zmA_0015000_0020000.mp4
    ...
    v_--0edUL8zmA_0085000_0090000.mp4
    v_--0edUL8zmA_0090000_0092120.mp4
    # 2nd video
    v_--8jh-DkPK4_0000000_0005000.mp4
    v_--8jh-DkPK4_0005000_0010000.mp4
    v_--8jh-DkPK4_0010000_0015000.mp4
    ...
    v_--8jh-DkPK4_0215000_0220000.mp4
    v_--8jh-DkPK4_0220000_0220420.mp4
    ...

We can observe that each video is segmented into multiple 5s duration segments, which can be easily achieved with FFMPEG. To facilitate decoding of multiple consecutive segments in pre-training, we also need to prepare an additional file that records the video name, start time and end time of each segment (i.e, training.txt):

--0edUL8zmA,0000000,0005000
--0edUL8zmA,0005000,0010000
--0edUL8zmA,0010000,0015000
--0edUL8zmA,0015000,0020000
...
--0edUL8zmA,0085000,0090000
--0edUL8zmA,0090000,0092120
--8jh-DkPK4,0000000,0005000
--8jh-DkPK4,0005000,0010000
--8jh-DkPK4,0010000,0015000
...
--8jh-DkPK4,0215000,0220000
--8jh-DkPK4,0220000,0220420
...

This file should be in the cfg.DATA.ANNO_DIR directory. The HACS dataset is a large-scale untrimmed video dataset for temporal action localization task. Each video in this dataset contains at least one action category and one background. If you have limited resources, we recommend validating your ideas on the HACS dataset first. The untrimmed version of the Kinetics400 dataset only needs to not cut the original videos.

Model Zoo

We include our pre-trained models in the MODEL_ZOO.md.

Running instructions

To train the model with HiCo, set the _BASE_RUN to point to configs/pool/run/training/simclr.yaml. See configs/projects/hico/simclr_*_s3dg.yaml for more details. Alternatively, you can also find some pre-trained model in the MODEL_ZOO.md.

For detailed explanations on the approach itself, please refer to the paper.

For an example run, set the DATA_ROOT_DIR and ANNO_DIR in configs/projects/hico/simclr_hacs_s3dg.yaml, and OUTPUT_DIR in configs/projects/hico/pt-hacs/s3dg-hico-s.yaml, and run the command for the short pre-training(for ablation studies):

python runs/run.py --cfg configs/projects/hico/pt-hacs/s3dg-hico-s.yaml

Run this command for the long pre-training:

python runs/run.py --cfg configs/projects/hico/pt-hacs/s3dg-hico-l.yaml

Citing HiCo

If you find HiCo useful for your research, please consider citing the paper as follows:

@inproceedings{qing2022hico,
  title={Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency},
  author={Qing, Zhiwu and Zhang, Shiwei and Huang, Ziyuan and Xu, Yi and Wang, Xiang and Tang, Mingqian and Gao, Changxin and Jin, Rong and Sang, Nong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13821--13831},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
datasets		datasets
models		models
projects		projects
runs		runs
sslgenerators		sslgenerators
utils		utils
.gitignore		.gitignore
GUIDELINES.md		GUIDELINES.md
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
compress.sh		compress.sh
framework.jpg		framework.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Latest

Installation

Data Preparation

Model Zoo

Running instructions

Citing HiCo

About

Releases

Packages

Languages

alibaba-mmai-research/HiCo

Folders and files

Latest commit

History

Repository files navigation

Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency

Latest

Installation

Data Preparation

Model Zoo

Running instructions

Citing HiCo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages