This package was developed by Mr. Chen-Lin Zhang (zhangcl@lamda.nju.edu.cn). For any problem concerning the code, please feel free to contact Mr. Zhang. This packages are free for academic usage. You can run them at your own risk. For other purposes, please contact Prof. Jianxin Wu (wujx2001@gmail.com).
python_speech_features: https://github.com/jameslyons/python_speech_features
First, please put the Test Video files into /test folder like training. For example, /test/testing80_01/a.mp4
Then, please use MATLAB and execute readdata_test.m. It will extract images from videos, and save as /test_jpg/a.mp4/1.jpg, etc.
Suppose you have download all the data needed in the current folder, these will be many *.zip files.
shell extract.sh # Unzip all the *.zip files.
shell mp42wav.sh # Extract wav files from mp4 files in the current directory.
python wav2logfbank.py
cd ../data_logfbank/
th train2torch.lua # Merge train csv files and save into disk.
First, in eval_val_reg_resnet.m,eval_val_reg_avg_max.m and eval_val_reg_avg_max_l28.m line 2, please change MatConvNet location to your own MatConvNet location.
Then, please execute the aforementioned three MATLAB files. You will get predictions_reg_avg_max_l28.csv, predictions_reg_avg_max.csv and predictions_reg_res.csv.
th predict.lua # Make predictions on the test data.
# Note the y_test.csv file must be provided.
# The final predictions file is predictions.csv.