Skip to content

Latest commit





Dataset Preparation

Processing Order

  1. Extract features: ./
  2. Generate lists:


  1. The options in the scripts have comments with the following types:
    • no comment: user can still change it, but NOT recommend (may need to change the code or have different experimental results)
    • comments with choices (e.g. true | false): can only choose from choices
    • comments as depend on users: totally depend on users (mostly related to data path)

Feature vector (frame-level) extraction loads the video dataset, and extract frame-level feature vectors, which are are needed for training and validation. A text file with all the category names is needed for labeled dataset.

Run ./

There are two modes:

  • labeled: class_file=XXX/class_list_XXX.txt. The features with the categories only listed in the class_file will generated.
  • unlabeled: class_file=none: all the features with the category "unlabeled" will be generated.

Output folder structures:


Dataset split generation splits the whole dataset into two subsets according to the split ratio (randomly picking videos). Each subset has the same structure as shown above.

Run ./


  • input_type: depend on the format of raw data
  • split_ratio: ratio of training data
  • split_feat: if the features are already generated, users can split the corresponding features as well.

Data list generation generates the data list from a video dataset. The output text file will include all the selected categories showing {video_path, frame#, class}.

Run ./


  • random_each_video: If Y, it means users randomly select video clips from each raw and unsegmented video. It N, it means users randomly select video clips from each category.
  • max_num: maximal numbers of selected video clips for each unsegmented video or category (depend on random_each_video)
  • method_read: the method of calculating frame # of each video
    • video: load from the raw video folder (slower, but more accurate)
    • frame: load from the feature folder
  • suffix: add some string to the list file name

Output text file (format: [video_full_path frame# class]):

data_path/RGB-Feature/VIDEO_0001/ 100 0
data_path/RGB-Feature/VIDEO_0002/ 150 1

DA list generation

If official splits exist (e.g. UCF, HMDB), can separate the data list from the official splits with user chosen DA settings.

Run ./