We have successfully pre-trained and fine-tuned our SIGMA on Kinetics400, Something-Something-V2, UCF101 and HMDB51 with this codebase.
-
The pre-processing of Something-Something-V2 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by changing the video extension from
webm
to.mp4
with the original height of 240px. -
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). We share our annotation files (train.csv, val.csv, test.csv) via Google Drive. The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
-
The pre-processing of Kinetics400 can be summarized into 3 steps:
-
Download the dataset from official website.
-
Preprocess the dataset by resizing the short edge of video to 320px. You can refer to MMAction2 Data Benchmark for TSN and SlowOnly.
-
Generate annotations needed for dataloader ("<path_to_video> <video_class>" in annotations). The annotation usually includes
train.csv
,val.csv
andtest.csv
( heretest.csv
is the same asval.csv
). The format of*.csv
file is like:dataset_root/video_1.mp4 label_1 dataset_root/video_2.mp4 label_2 dataset_root/video_3.mp4 label_3 ... dataset_root/video_N.mp4 label_N
-
We use decord to decode the videos on the fly during both pre-training and fine-tuning phases.