PyTorch Implementation of Elliott Loveridge's master's thesis, including code for video classification and model compression.
- PyTorch 1.0.1.post2
- OpenCV
- FFmpeg, FFprobe
- Python 3
- Distiller
- Docker
Pre-trained models can be downloaded from a Google Drive here.
Tested models:
- 3D MobileNetv2
- 3D ResNet
- 3D CSN
MobileNetV2's complexity may be adjusted via a 'width_multiplier' arg, with 'model_depth' choices possible for both ResNet and CSN.
If using Bath University's OGG, data has been saved in the following dir:
- Download videos and train/test splits here.
- Convert from avi to jpg files using
python utils/ avi_video_directory jpg_video_directory
- Generate n_frames files using
python utils/ jpg_video_directory
- Generate annotation file in json format similar to ActivityNet using
includes classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt
python utils/ annotation_dir_path
All experiments use annotation file 1, saved within this repo.
Data is assumed to be stored under the following schema:
This project was maintained within Docker to ensure the correct installation of Distiller and other relevant packages. If running this code on Bath University's OGG Service, reference the relevant Docker image via the below examples of running a test. The Dockerfile has been provided should you wish to create a similar image yourself.
Model configurations are given as follows:
ResNet-18 : --model resnet --model_depth 18 --resnet_shortcut A
ResNet-50 : --model resnet --model_depth 50 --resnet_shortcut B
ResNet-101 : --model resnet --model_depth 101 --resnet_shortcut B
MobileNetV2-1.0x : --model mobilenetv2 --width_mult 1.0
CSN-50 : --model csn --model_depth 50
Example code runs are saved in, and it is assumed this is used to run tests. Make sure to specify all parameters required for a given run. Also, make sure the bash script is runnable - use 'chmod +x' for a script.
An example run is given as follows:
- Docker Code:
docker run --rm --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 -v "$(pwd)":/app -v "/mnt/slow0/ucf101/data":/data elliottloveridge/distiller /app/
- NVIDIA_VISIBLE_DEVICES defines which available GPU to use
Then, contains the following examples;
- Training from scratch:
python /app/compressed-3d-cnn/ --root_path /data \
--video_path ucf101_videos/jpg/ \
--annotation_path /app/compressed-3d-cnn/annotation_UCF101/ucf101_01.json \
--result_path results \
--dataset ucf101 \
--n_classes 101 \
--batch_size 32 \
--model mobilenetv2 \
--width_mult 1.0 \
--learning_rate 0.1 \
--n_val_samples 1 \
--n_epochs 20 \
- Evaluation:
python /app/compressed-3d-cnn/utils/ --root_path /data \
--annotation_path /app/compressed-3d-cnn/annotation_UCF101/ucf101_01.json \
--dataset ucf101 \
--result_path results
Evaluation will create a folder for MMYY (see Dataset Dir above) and store it within the relevant sub-folder (this would be 'benchmark' for the above example)
Example runs for model-compression methods are saved within, see for all required compression arguments
folder withincompressed-3d-cnn
contains YAML files required by the compression scheduler
There are several augmentation techniques available. Please check and for the details of the augmentation methods.
Note: "RandomHorizontalFlip" and "RandomCrop" were used for training of UCF101
In order to calculate video accuracy, you should first run the models with '--test' mode in order to create 'val.json'. Then, you need to run the evaluation script, given as an example above
I'd like to thank both Kensho Hara for releasing his codebase, the people who extended this work, and the team working on Distiller who allowed for model compression to be implemented.