We currenent release the code and models for:
-
Masked Pretraining
-
Short-term Video Understaning
- K400 and SthSthV2
-
Long-term Video Understaning
- Breakfast, COIN and LVU
- 🔥 03/12/2024: Pretrained models on ImageNet-1K are released.
You can find the dataset instructions in DATASET.
You can find all the models and the scripts in MODEL_ZOO.
We use CLIP pretrained models as the unmasked teachers by default:
- Follow extract.ipynb to extract visual encoder from CLIP.
- Change
MODEL_PATH
in clip.py.
For training, you can simply run the pretraining scripts as follows:
bash ./exp/k400/videomamba_middle_mask/run_mask_pretrain.sh
Notes:
- Chage
DATA_PATH
to your data path before running the scripts.--sampling_rate
is set to 1 for sprase sampling.- The latest checkpoint will be automatically saved while training, thus we use a large
--save_ckpt_freq
.- For VideoMamba-M, we use CLIP-B-ViT as the teacher.
For finetuning, you can simply run the fine-tuning scripts as follows:
bash ./exp/k400/videomamba_middle_mask/run_f8x224.sh
Notes:
- Chage
DATA_PATH
AndPREFIX
to your data path before running the scripts.- Set
--finetune
when using masked pretrained model.- The best checkpoint will be automatically evaluated with
--test_best
.- Set
--test_num_segment
and--test_num_crop
for different evaluation strategies.- To only run evaluation, just set
--eval
.
For BreakFast and COIN, you can simply run the fine-tuning scripts as follows:
bash ./exp/breakfast/videomamba_middle_mask/run_f32x224.sh
For LVU, there are classification and regression tasks, you can simply run the fine-tuning scripts as follows:
# classification
bash ./exp/lvu/run_class.sh
# regression
bash ./exp/lvu/run_regression.sh
Notes: For regression tasks, the data should be preprocessed with normalization as in ViS4mer.
By default, we use Kinetics_sparse
dataset for different datasets. However, in ViS4mer, the authors use trimmed clips with sliding window, which may improve the results. We also provided a dataset with sliding window as follows:
# classification
bash ./exp/lvu/run_class_trim.sh
# regression
bash ./exp/lvu/run_regression_trim.sh
Notes:
- Set
trimmed
for the length of trimmed videos.- Set
time_stride
for the length of sliding window.