- [01/14/2022] Raw videos uploaded to Google Drive, for access please send us an e-mail: zxwu at fudan.edu.cn
- [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zxwu at fudan.edu.cn
- [09/28/2021] Features uploaded to Aliyun Drive(deprecated), for access please send us an e-mail: zxwu at fudan.edu.cn
- [08/23/2021] Checkpoint links uploaded, sorry we are handling campus network bandwidth limitation, dataset will be released in this weeek.
- [08/15/2021] Code released. Dataset download links and checkpoints links will be updated in a week.
- [07/29/2021] Dataset released, visit https://videolt.github.io/ for downloading.
- [07/23/2021] VideoLT is accepted by ICCV2021.
VideoLT is a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. We provide VideoLT dataset and long-tailed baselines in this repo including:
Please be aware that VideoLT is only for non-commercial use, please send us an e-mail: zxwu at fudan.edu.cn and agree to our license, then we will send back the download links to you. We provide raw videos(~1.7TB) and extracted features(~900GB in total, ~295GB for each).
To decompress the .tar.gz
files, please use commands:
cat TSM-R50-feature.tar.gz.part* | tar zx
cat ResNet50-feature.tar.gz.part* | tar zx
cat ResNet101-feature.tar.gz.part* | tar zx
For using extracted features, please modify dataset/dutils.py
and set the correct path to features.
The baseline scripts and checkpoints are provided in MODELZOO.md.
FrameStack is simple yet effective approach for long-tailed video recognition which re-samples training data at the frame level and adopts a dynamic sampling strategy based on knowledge learned by the network. The rationale behind FrameStack is to dynamically sample more frames from videos in tail classes and use fewer frames for those from head classes.
pip install -r requirements.txt
-
Modify
FEATURE_NAME
,PATH_TO_FEATURE
andFEATURE_DIM
indataset/dutils.py
. -
Set
ROOT
indataset/dutils.py
tolabels
folder. The directory structure is:
labels
|-- count-labels-train.lst
|-- test.lst
|-- test_videofolder.txt
|-- train.lst
|-- train_videofolder.txt
|-- val_videofolder.txt
`-- validate.lst
We provide scripts for training. Please refer to MODELZOO.md.
Example training scripts:
FEATURE_NAME='ResNet101'
export CUDA_VISIBLE_DEVICES='2'
python base_main.py \
--augment "mixup" \
--feature_name $FEATURE_NAME \
--lr 0.0001 \
--gd 20 --lr_steps 30 60 --epochs 100 \
--batch-size 128 -j 16 \
--eval-freq 5 \
--print-freq 20 \
--root_log=$FEATURE_NAME-log \
--root_model=$FEATURE_NAME'-checkpoints' \
--store_name=$FEATURE_NAME'_bs128_lr0.0001_lateavg_mixup' \
--num_class=1004 \
--model_name=NonlinearClassifier \
--train_num_frames=60 \
--val_num_frames=150 \
--loss_func=BCELoss \
Note: Set args.resample
, args.augment
and args.loss_func
can apply multiple long-tailed stratigies.
Options:
args.resample: ['None', 'CBS','SRS']
args.augment : ['None', 'mixup', 'FrameStack']
args.loss_func: ['BCELoss', 'LDAM', 'EQL', 'CBLoss', 'FocalLoss']
We provide scripts for testing in scripts
. Modify CKPT
to saved checkpoints.
Example testing scripts:
FEATURE_NAME='ResNet101'
CKPT='VideoLT_checkpoints/ResNet-101/ResNet101_bs128_lr0.0001_lateavg_mixup/ckpt.best.pth.tar'
export CUDA_VISIBLE_DEVICES='1'
python base_test.py \
--resume $CKPT \
--feature_name $FEATURE_NAME \
--batch-size 128 -j 16 \
--print-freq 20 \
--num_class=1004 \
--model_name=NonlinearClassifier \
--train_num_frames=60 \
--val_num_frames=150 \
--loss_func=BCELoss \
If you find VideoLT helpful for your research, please consider citing:
@InProceedings{Zhang_2021_ICCV,
author = {Zhang, Xing and Wu, Zuxuan and Weng, Zejia and Fu, Huazhu and Chen, Jingjing and Jiang, Yu-Gang and Davis, Larry S.},
title = {VideoLT: Large-Scale Long-Tailed Video Recognition},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {7960-7969}
}